11 research outputs found
Streamability of nested word transductions
We consider the problem of evaluating in streaming (i.e., in a single
left-to-right pass) a nested word transduction with a limited amount of memory.
A transduction T is said to be height bounded memory (HBM) if it can be
evaluated with a memory that depends only on the size of T and on the height of
the input word. We show that it is decidable in coNPTime for a nested word
transduction defined by a visibly pushdown transducer (VPT), if it is HBM. In
this case, the required amount of memory may depend exponentially on the height
of the word. We exhibit a sufficient, decidable condition for a VPT to be
evaluated with a memory that depends quadratically on the height of the word.
This condition defines a class of transductions that strictly contains all
determinizable VPTs
Two-Way Visibly Pushdown Automata and Transducers
Automata-logic connections are pillars of the theory of regular languages.
Such connections are harder to obtain for transducers, but important results
have been obtained recently for word-to-word transformations, showing that the
three following models are equivalent: deterministic two-way transducers,
monadic second-order (MSO) transducers, and deterministic one-way automata
equipped with a finite number of registers. Nested words are words with a
nesting structure, allowing to model unranked trees as their depth-first-search
linearisations. In this paper, we consider transformations from nested words to
words, allowing in particular to produce unranked trees if output words have a
nesting structure. The model of visibly pushdown transducers allows to describe
such transformations, and we propose a simple deterministic extension of this
model with two-way moves that has the following properties: i) it is a simple
computational model, that naturally has a good evaluation complexity; ii) it is
expressive: it subsumes nested word-to-word MSO transducers, and the exact
expressiveness of MSO transducers is recovered using a simple syntactic
restriction; iii) it has good algorithmic/closure properties: the model is
closed under composition with a unambiguous one-way letter-to-letter transducer
which gives closure under regular look-around, and has a decidable equivalence
problem
Visibly Pushdown Transducers with Well-nested Outputs
Visibly pushdown transducers (VPTs) are visibly pushdown automata extended with outputs. They have been introduced to model transformations of nested words, i.e. words with a call/return structure. When outputs are also structured and well nested words, VPTs are a natural formalism to express tree transformations evaluated in streaming. We prove the class of VPTs with well-nested outputs to be decidable in PTIME. Moreover, we show that this class is closed under composition and that its type-checking against visibly pushdown languages is decidable
XQuery Streaming by Forest Transducers
Streaming of XML transformations is a challenging task and only very few
systems support streaming. Research approaches generally define custom
fragments of XQuery and XPath that are amenable to streaming, and then design
custom algorithms for each fragment. These languages have several shortcomings.
Here we take a more principles approach to the problem of streaming
XQuery-based transformations. We start with an elegant transducer model for
which many static analysis problems are well-understood: the Macro Forest
Transducer (MFT). We show that a large fragment of XQuery can be translated
into MFTs --- indeed, a fragment of XQuery, that can express important features
that are missing from other XQuery stream engines, such as GCX: our fragment of
XQuery supports XPath predicates and let-statements. We then rely on a
streaming execution engine for MFTs, one which uses a well-founded set of
optimizations from functional programming, such as strictness analysis and
deforestation. Our prototype achieves time and memory efficiency comparable to
the fastest known engine for XQuery streaming, GCX. This is surprising because
our engine relies on the OCaml built in garbage collector and does not use any
specialized buffer management, while GCX's efficiency is due to clever and
explicit buffer management.Comment: Full version of the paper in the Proceedings of the 30th IEEE
International Conference on Data Engineering (ICDE 2014
Programming Using Automata and Transducers
Automata, the simplest model of computation, have proven to be an effective tool in reasoning about programs that operate over strings. Transducers augment automata to produce outputs and have been used to model string and tree transformations such as natural language translations. The success of these models is primarily due to their closure properties and decidable procedures, but good properties come at the price of limited expressiveness. Concretely, most models only support finite alphabets and can only represent small classes of languages and transformations. We focus on addressing these limitations and bridge the gap between the theory of automata and transducers and complex real-world applications: Can we extend automata and transducer models to operate over structured and infinite alphabets? Can we design languages that hide the complexity of these formalisms? Can we define executable models that can process the input efficiently? First, we introduce succinct models of transducers that can operate over large alphabets and design BEX, a language for analysing string coders. We use BEX to prove the correctness of UTF and BASE64 encoders and decoders. Next, we develop a theory of tree transducers over infinite alphabets and design FAST, a language for analysing tree-manipulating programs. We use FAST to detect vulnerabilities in HTML sanitizers, check whether augmented reality taggers conflict, and optimize and analyze functional programs that operate over lists and trees. Finally, we focus on laying the foundations of stream processing of hierarchical data such as XML files and program traces. We introduce two new efficient and executable models that can process the input in a left-to-right linear pass: symbolic visibly pushdown automata and streaming tree transducers. Symbolic visibly pushdown automata are closed under Boolean operations and can specify and efficiently monitor complex properties for hierarchical structures over infinite alphabets. Streaming tree transducers can express and efficiently process complex XML transformations while enjoying decidable procedures
Trimming Visibly Pushdown Automata
Abstract We study the problem of trimming visibly pushdown automata (VPA). We first describe a polynomial time procedure which, given a visibly pushdown automaton that accepts only well-nested words, returns an equivalent visibly pushdown automaton that is trimmed. We then show how this procedure can be lifted to the setting of arbitrary VPA. Furthermore, we present a way of building, given a VPA, an equivalent VPA which is both deterministic and trimmed. Last, our trimming procedures can be applied to weighted VPA
Safe Programming Over Distributed Streams
The sheer scale of today\u27s data processing needs has led to a new paradigm of software systems centered around requirements for high-throughput, distributed, low-latency computation.Despite their widespread adoption, existing solutions have yet to provide a programming model with safe semantics -- and they disagree on basic design choices, in particular with their approach to parallelism. As a result, naive programmers are easily led to introduce correctness and performance bugs.
This work proposes a reliable programming model for modern distributed stream processing, founded in a type system for partially ordered data streams. On top of the core type system, we propose language abstractions for working with streams -- mechanisms to build stream operators with (1) type-safe compositionality, (2) deterministic distribution, (3) run-time testing, and (4) static performance bounds. Our thesis is that viewing streams as partially ordered conveniently exposes parallelism without compromising safety or determinism. The ideas contained in this work are implemented in a series of open source software projects, including the Flumina, DiffStream, and Data Transducers libraries