Search CORE

513 research outputs found

Expressiveness of Streaming String Transducers

Author: Alur Rajeev
Publication venue: ScholarlyCommons
Publication date: 01/01/2010
Field of study

Streaming string transducers define (partial) functions from input strings to output strings. A streaming string transducer makes a single pass through the input string and uses a finite set of variables that range over strings from the output alphabet. At every step, the transducer processes an input symbol, and updates all the variables in parallel using assignments whose right-hand-sides are concatenations of output symbols and variables with the restriction that a variable can be used at most once in a right-hand-side expression. It has been shown that streaming string transducers operating on strings over infinite data domains are of interest in algorithmic verification of list-processing programs, as they lead to Pspace decision procedures for checking pre/postconditions and for checking semantic equivalence, for a well-defined class of heap-manipulating programs. In order to understand the theoretical expressiveness of streaming transducers, we focus on streaming transducers processing strings over finite alphabets, given the existence of a robust and well-studied class of ``regular\u27\u27 transductions for this case. Such regular transductions can be defined either by two-way deterministic finite-state transducers, or using a logical MSO-based characterization. Our main result is that the expressiveness of streaming string transducers coincides exactly with this class of regular transductions

Dagstuhl Research Online Publication Server

ScholarlyCommons@Penn

LIPIcs

Author: Alur Rajeev
Cerny Pavol
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 01/01/2010
Field of study

Streaming string transducers [1] define (partial) functions from input strings to output strings. A streaming string transducer makes a single pass through the input string and uses a finite set of variables that range over strings from the output alphabet. At every step, the transducer processes an input symbol, and updates all the variables in parallel using assignments whose right-hand-sides are concatenations of output symbols and variables with the restriction that a variable can be used at most once in a right-hand-side expression. It has been shown that streaming string transducers operating on strings over infinite data domains are of interest in algorithmic verification of list-processing programs, as they lead to PSPACE decision procedures for checking pre/post conditions and for checking semantic equivalence, for a well-defined class of heap-manipulating programs. In order to understand the theoretical expressiveness of streaming transducers, we focus on streaming transducers processing strings over finite alphabets, given the existence of a robust and well-studied class of "regular" transductions for this case. Such regular transductions can be defined either by two-way deterministic finite-state transducers, or using a logical MSO-based characterization. Our main result is that the expressiveness of streaming string transducers coincides exactly with this class of regular transductions

IST Austria: PubRep (Institute of Science and Technology)

IST PubRep

Regular Combinators for String Transformations

Author: Alur R.
Bojanczyk M.
Chytil M.
Courcelle B.
Publication venue
Publication date: 12/02/2014
Field of study

We focus on (partial) functions that map input strings to a monoid such as the set of integers with addition and the set of output strings with concatenation. The notion of regularity for such functions has been defined using two-way finite-state transducers, (one-way) cost register automata, and MSO-definable graph transformations. In this paper, we give an algebraic and machine-independent characterization of this class analogous to the definition of regular languages by regular expressions. When the monoid is commutative, we prove that every regular function can be constructed from constant functions using the combinators of choice, split sum, and iterated sum, that are analogs of union, concatenation, and Kleene-*, respectively, but enforce unique (or unambiguous) parsing. Our main result is for the general case of non-commutative monoids, which is of particular interest for capturing regular string-to-string transformations for document processing. We prove that the following additional combinators suffice for constructing all regular functions: (1) the left-additive versions of split sum and iterated sum, which allow transformations such as string reversal; (2) sum of functions, which allows transformations such as copying of strings; and (3) function composition, or alternatively, a new concept of chained sum, which allows output values from adjacent blocks to mix.Comment: This is the full version, with omitted proofs and constructions, of the conference paper currently in submissio

arXiv.org e-Print Archive

CiteSeerX

Crossref

Two-Way Visibly Pushdown Automata and Transducers

Author: Alur R.
Bojanczyk M.
Chytil M.
Courcelle B.
Filiot E.
Filiot E.
Filiot E.
Gurari E.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

Automata-logic connections are pillars of the theory of regular languages. Such connections are harder to obtain for transducers, but important results have been obtained recently for word-to-word transformations, showing that the three following models are equivalent: deterministic two-way transducers, monadic second-order (MSO) transducers, and deterministic one-way automata equipped with a finite number of registers. Nested words are words with a nesting structure, allowing to model unranked trees as their depth-first-search linearisations. In this paper, we consider transformations from nested words to words, allowing in particular to produce unranked trees if output words have a nesting structure. The model of visibly pushdown transducers allows to describe such transformations, and we propose a simple deterministic extension of this model with two-way moves that has the following properties: i) it is a simple computational model, that naturally has a good evaluation complexity; ii) it is expressive: it subsumes nested word-to-word MSO transducers, and the exact expressiveness of MSO transducers is recovered using a simple syntactic restriction; iii) it has good algorithmic/closure properties: the model is closed under composition with a unambiguous one-way letter-to-letter transducer which gives closure under regular look-around, and has a decidable equivalence problem

arXiv.org e-Print Archive

Crossref

HAL AMU

DI-fusion

Stream Processing using Grammars and Regular Expressions

Author: Rasmussen Ulrik Terp
Publication venue
Publication date: 01/01/2016
Field of study

In this dissertation we study regular expression based parsing and the use of grammatical specifications for the synthesis of fast, streaming string-processing programs. In the first part we develop two linear-time algorithms for regular expression based parsing with Perl-style greedy disambiguation. The first algorithm operates in two passes in a semi-streaming fashion, using a constant amount of working memory and an auxiliary tape storage which is written in the first pass and consumed by the second. The second algorithm is a single-pass and optimally streaming algorithm which outputs as much of the parse tree as is semantically possible based on the input prefix read so far, and resorts to buffering as many symbols as is required to resolve the next choice. Optimality is obtained by performing a PSPACE-complete pre-analysis on the regular expression. In the second part we present Kleenex, a language for expressing high-performance streaming string processing programs as regular grammars with embedded semantic actions, and its compilation to streaming string transducers with worst-case linear-time performance. Its underlying theory is based on transducer decomposition into oracle and action machines, and a finite-state specialization of the streaming parsing algorithm presented in the first part. In the second part we also develop a new linear-time streaming parsing algorithm for parsing expression grammars (PEG) which generalizes the regular grammars of Kleenex. The algorithm is based on a bottom-up tabulation algorithm reformulated using least fixed points and evaluated using an instance of the chaotic iteration scheme by Cousot and Cousot

arXiv.org e-Print Archive

Copenhagen University Research Information System

In the Maze of Data Languages

Author: D'Antoni Loris
Publication venue
Publication date: 29/08/2012
Field of study

In data languages the positions of strings and trees carry a label from a finite alphabet and a data value from an infinite alphabet. Extensions of automata and logics over finite alphabets have been defined to recognize data languages, both in the string and tree cases. In this paper we describe and compare the complexity and expressiveness of such models to understand which ones are better candidates as regular models

arXiv.org e-Print Archive

CiteSeerX

Streaming Tree Transducers

Author: B. Courcelle
F. Neven
H. Hosoya
H. Seidl
J. Engelfriet
J. Engelfriet
J. Engelfriet
J. Engelfriet
J. Esparza
P. Madhusudan
W. Martens
Publication venue
Publication date: 01/01/2012
Field of study

Theory of tree transducers provides a foundation for understanding expressiveness and complexity of analysis problems for specification languages for transforming hierarchically structured data such as XML documents. We introduce streaming tree transducers as an analyzable, executable, and expressive model for transforming unranked ordered trees in a single pass. Given a linear encoding of the input tree, the transducer makes a single left-to-right pass through the input, and computes the output in linear time using a finite-state control, a visibly pushdown stack, and a finite number of variables that store output chunks that can be combined using the operations of string-concatenation and tree-insertion. We prove that the expressiveness of the model coincides with transductions definable using monadic second-order logic (MSO). Existing models of tree transducers either cannot implement all MSO-definable transformations, or require regular look ahead that prohibits single-pass implementation. We show a variety of analysis problems such as type-checking and checking functional equivalence are solvable for our model.Comment: 40 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

ScholarlyCommons@Penn