29 research outputs found

    Streamability of nested word transductions

    Full text link
    We consider the problem of evaluating in streaming (i.e., in a single left-to-right pass) a nested word transduction with a limited amount of memory. A transduction T is said to be height bounded memory (HBM) if it can be evaluated with a memory that depends only on the size of T and on the height of the input word. We show that it is decidable in coNPTime for a nested word transduction defined by a visibly pushdown transducer (VPT), if it is HBM. In this case, the required amount of memory may depend exponentially on the height of the word. We exhibit a sufficient, decidable condition for a VPT to be evaluated with a memory that depends quadratically on the height of the word. This condition defines a class of transductions that strictly contains all determinizable VPTs

    Beyond Language Equivalence on Visibly Pushdown Automata

    Full text link
    We study (bi)simulation-like preorder/equivalence checking on the class of visibly pushdown automata and its natural subclasses visibly BPA (Basic Process Algebra) and visibly one-counter automata. We describe generic methods for proving complexity upper and lower bounds for a number of studied preorders and equivalences like simulation, completed simulation, ready simulation, 2-nested simulation preorders/equivalences and bisimulation equivalence. Our main results are that all the mentioned equivalences and preorders are EXPTIME-complete on visibly pushdown automata, PSPACE-complete on visibly one-counter automata and P-complete on visibly BPA. Our PSPACE lower bound for visibly one-counter automata improves also the previously known DP-hardness results for ordinary one-counter automata and one-counter nets. Finally, we study regularity checking problems for visibly pushdown automata and show that they can be decided in polynomial time.Comment: Final version of paper, accepted by LMC

    Generalizing input-driven languages: theoretical and practical benefits

    Get PDF
    Regular languages (RL) are the simplest family in Chomsky's hierarchy. Thanks to their simplicity they enjoy various nice algebraic and logic properties that have been successfully exploited in many application fields. Practically all of their related problems are decidable, so that they support automatic verification algorithms. Also, they can be recognized in real-time. Context-free languages (CFL) are another major family well-suited to formalize programming, natural, and many other classes of languages; their increased generative power w.r.t. RL, however, causes the loss of several closure properties and of the decidability of important problems; furthermore they need complex parsing algorithms. Thus, various subclasses thereof have been defined with different goals, spanning from efficient, deterministic parsing to closure properties, logic characterization and automatic verification techniques. Among CFL subclasses, so-called structured ones, i.e., those where the typical tree-structure is visible in the sentences, exhibit many of the algebraic and logic properties of RL, whereas deterministic CFL have been thoroughly exploited in compiler construction and other application fields. After surveying and comparing the main properties of those various language families, we go back to operator precedence languages (OPL), an old family through which R. Floyd pioneered deterministic parsing, and we show that they offer unexpected properties in two fields so far investigated in totally independent ways: they enable parsing parallelization in a more effective way than traditional sequential parsers, and exhibit the same algebraic and logic properties so far obtained only for less expressive language families

    Programming Using Automata and Transducers

    Get PDF
    Automata, the simplest model of computation, have proven to be an effective tool in reasoning about programs that operate over strings. Transducers augment automata to produce outputs and have been used to model string and tree transformations such as natural language translations. The success of these models is primarily due to their closure properties and decidable procedures, but good properties come at the price of limited expressiveness. Concretely, most models only support finite alphabets and can only represent small classes of languages and transformations. We focus on addressing these limitations and bridge the gap between the theory of automata and transducers and complex real-world applications: Can we extend automata and transducer models to operate over structured and infinite alphabets? Can we design languages that hide the complexity of these formalisms? Can we define executable models that can process the input efficiently? First, we introduce succinct models of transducers that can operate over large alphabets and design BEX, a language for analysing string coders. We use BEX to prove the correctness of UTF and BASE64 encoders and decoders. Next, we develop a theory of tree transducers over infinite alphabets and design FAST, a language for analysing tree-manipulating programs. We use FAST to detect vulnerabilities in HTML sanitizers, check whether augmented reality taggers conflict, and optimize and analyze functional programs that operate over lists and trees. Finally, we focus on laying the foundations of stream processing of hierarchical data such as XML files and program traces. We introduce two new efficient and executable models that can process the input in a left-to-right linear pass: symbolic visibly pushdown automata and streaming tree transducers. Symbolic visibly pushdown automata are closed under Boolean operations and can specify and efficiently monitor complex properties for hierarchical structures over infinite alphabets. Streaming tree transducers can express and efficiently process complex XML transformations while enjoying decidable procedures

    Diagnosis and Opacity Problems for Infinite State Systems Modeled by Recursive Tile Systems

    Get PDF
    International audienceThe analysis of discrete event systems under partial observation is an important topic, with major applications such as the detection of information flow and the diagnosis of faulty behaviors. These questions have, mostly, not been addressed for classical models of recursive systems, such as pushdown systems and recursive state machines. In this paper, we consider recursive tile systems, which are recursive infinite systems generated by a finite collection of finite tiles, a simplified variant of deterministic graph grammars (slightly more general than pushdown systems). Since these systems are infinite-state in general powerset constructions for monitoring do not always apply. We exhibit computable conditions on recursive tile systems and present non-trivial constructions that yield effective computation of the monitors.We apply these results to the classic problems of state-based opacity and diagnosability (off-line verification of opacity and diagnosability, and also run-time monitoring of these properties). For a decidable subclass of recursive tile systems, we also establish the decidability of the problems of state-based opacity and diagnosability

    Tree-Structured Problems and Parallel Computation

    Get PDF
    Turing-Maschinen sind das klassische Beschreibungsmittel für Wortsprachen und werden daher auch benützt, um Komplexitätsklassen zu definieren. Dies geschieht zum Beispiel durch das Einschränken des Platz- oder Zeitaufwandes der Berechnung zur Lösung eines Problems. Für sehr niedrige Komplexität wie etwa sublineare Laufzeit, werden Schaltkreise verwendet. Schaltkreise können auf natürliche Art Komplexitäten wie etwa logarithmische Laufzeit modellieren. Ebenso können sie als eine Art paralleles Rechenmodell gesehen werden. Eine wichtige parallele Komplexitätsklasse ist NC1. Sie wird beschrieben durch Boolesche Schaltkreise logarithmischer Tiefe und beschränktem Eingangsgrad der Gatter. Eine initiale Beobachtung, die die vorliegende Arbeit motiviert, ist, dass viele schwere Probleme in NC1 eine ähnliche Struktur haben und auf ähnliche Art und Weise gelöst werden. Das Auswertungsproblem für Boolesche Formeln ist eines der repräsentativsten Probleme aus dieser Klasse: Gegeben ist hier eine aussagenlogische Formel samt Belegung für die Variablen; gefragt ist, ob sie zu wahr oder zu falsch auswertet. Dieses Problem wird in NC1 gelöst durch den Algorithmus von Buss. Auf ähnliche Art können arithmetische Formeln in #NC1 ausgewertet oder das Wortproblem für Visibly-Pushdown-Sprachen gelöst werden. Zu besagter Klasse an Problemen gehört auch Courcelles Theorem, welches Berechnungen in Baumautomaten involviert. Zu bemerken ist, dass alle angesprochenen Probleme gemeinsam haben, dass sie aus Instanzen bestehen, die baumartig sind. Formeln sind Bäume, Visibly-Pushdown-Sprachen enthalten als Wörter kodierte Bäume und Courcelles Theorem betrachtet Graphen mit beschränkter Baumweite, d.h. Graphen, die sich als Baum darstellen lassen. Insbesondere Letzteres ist ein Schema, das häufiger auftritt. Zum Beispiel gibt es NP-vollständige Graphprobleme wie das Finden von Hamilton-Kreisen, welches unter beschränkter Baumweite in P fällt. Neuere Analysen konnten diese Schranke weiter zu SAC1 verbessern, was eine parallele Komplexitätsklasse ist. Die angesprochenen Probleme kommen aus unterschiedlichen Bereichen und haben individuelle Lösungen. Hauptthese dieser Arbeit ist, dass sich diese Vielfalt vereinheitlichen lässt. Es wird ein generisches Lösungskonzept vorgestellt, welches darauf beruht, dass sich die Probleme auf ein Termevaluierungsproblem reduzieren lassen. Kernstück ist daher ein Termevaluierungsalgorithmus, der unabhängig von der Algebra, über welche der Term evaluiert werden soll, ist. Resultat ist, dass eine Vielzahl, darunter die oben angesprochenen Probleme, sich auf analoge Art lösen lassen, und dass sich ebenso leicht neue Resultate zeigen lassen. Diese Menge an Resultaten hätte sich ohne den vereinheitlichten Lösungsansatz nicht innerhalb des Rahmens einer Arbeit wie der vorliegenden zeigen lassen. Der entwickelte Lösungsansatz führt stets zu Schaltkreisfamilien polylogarithmischer Tiefe. Es wird jedoch auch die Frage behandelt, wie mächtig Schaltkreisfamilien konstanter Tiefe noch bezüglich Termevaluierung sind. Die Klasse AC0 ist hierfür ein natürlicher Kandidat; sie entspricht der Menge der Sprachen, die durch Logik erster Ordung beschreibbar sind. Um dieses Problem anzugehen, wird zunächst das Termevaluierungsproblem über endlichen Algebren betrachtet. Dieses wiederum lässt sich in das Wortproblem von Visibly-Pushdown-Sprachen einbetten. Daher handelt dieser Teil der Arbeit vornehmlich von der Beschreibbarkeit von Visibly-Pushdown-Sprachen in Logik erster Ordnung. Hierbei treten ungelöste Probleme zu Tage, welche ein Indiz dafür sind, wie schlecht die Komplexität konstanter Tiefe bisher noch verstanden ist, und das, trotz des Resultats von Furst, Saxe und Sipser, bzw. Håstads. Die bis jetzt beschrieben Inhalte sind Teil einer kontinuierlichen Entwicklung. Es gibt jedoch ein Thema in dieser Arbeit, das orthogonal dazu ist: Automaten und im speziellen Cost-Register-Automaten. Zum einen sind, wie oben angedeutet, Automaten Beispiele für Anwendungen des hier entwickelten generischen Lösungsansatzes. Zum anderen können sie selbst zur Beschreibung von Termevaluierungsproblemen dienen; so können Visibly-Pushdown-Automaten Termevaluierung über endlichen Algebren ausführen. Um über endliche Algebren hinauszugehen, benötigen die Automaten mehr Speicher. Visibly-Pushdown-Automaten haben einen Keller, der genau dafür geeignet ist, die Baumstruktur einer Eingabeformel zu verifizieren. Für nichtendliche Algebren eignet sich ein Modell, welches hier vorgestellt werden soll. Es kombiniert Visibly-Pushdown-Automaten mit Cost-Register-Automaten. Ein Cost-Register-Automat ist ein endlicher Automat, welcher mit zusätzlichen Registern ausgestattet ist. Die Register können Werte einer Algebra speichern und werden in jedem Schritt in Abhängigkeit des Eingabezeichens und des Zustandes aktualisiert. Dieser Einwegdatenfluss von Zuständen zu Registern sorgt dafür, dass dieses Modell nicht nur entscheidbar bleibt, sondern, in Abhängigkeit der Algebra, auch niedrige Komplexität hat. Das neue Modell der Cost-Register-Visibly-Pushdown-Automaten kann nun Terme evaluieren. Es werden grundlegende Eigenschaften gezeigt, einschließlich Komplexitätsaussagen

    Logic and Automata

    Get PDF
    Mathematical logic and automata theory are two scientific disciplines with a fundamentally close relationship. The authors of Logic and Automata take the occasion of the sixtieth birthday of Wolfgang Thomas to present a tour d'horizon of automata theory and logic. The twenty papers in this volume cover many different facets of logic and automata theory, emphasizing the connections to other disciplines such as games, algorithms, and semigroup theory, as well as discussing current challenges in the field

    Stream Processing using Grammars and Regular Expressions

    Full text link
    In this dissertation we study regular expression based parsing and the use of grammatical specifications for the synthesis of fast, streaming string-processing programs. In the first part we develop two linear-time algorithms for regular expression based parsing with Perl-style greedy disambiguation. The first algorithm operates in two passes in a semi-streaming fashion, using a constant amount of working memory and an auxiliary tape storage which is written in the first pass and consumed by the second. The second algorithm is a single-pass and optimally streaming algorithm which outputs as much of the parse tree as is semantically possible based on the input prefix read so far, and resorts to buffering as many symbols as is required to resolve the next choice. Optimality is obtained by performing a PSPACE-complete pre-analysis on the regular expression. In the second part we present Kleenex, a language for expressing high-performance streaming string processing programs as regular grammars with embedded semantic actions, and its compilation to streaming string transducers with worst-case linear-time performance. Its underlying theory is based on transducer decomposition into oracle and action machines, and a finite-state specialization of the streaming parsing algorithm presented in the first part. In the second part we also develop a new linear-time streaming parsing algorithm for parsing expression grammars (PEG) which generalizes the regular grammars of Kleenex. The algorithm is based on a bottom-up tabulation algorithm reformulated using least fixed points and evaluated using an instance of the chaotic iteration scheme by Cousot and Cousot

    Symbolic Model Learning: New Algorithms and Applications

    Get PDF
    In this thesis, we study algorithms which can be used to extract, or learn, formal mathematical models from software systems and then using these models to test whether the given software systems satisfy certain security properties such as robustness against code injection attacks. Specifically, we focus on studying learning algorithms for automata and transducers and the symbolic extensions of these models, namely symbolic finite automata (SFAs). In a high level, this thesis contributes the following results: 1. In the first part of the thesis, we present a unified treatment of many common variations of the seminal L* algorithm for learning deterministic finite automata (DFAs) as a congruence learning algorithm for the underlying Nerode congruence which forms the basis of automata theory. Under this formulation the basic data structures used by different variations are unified as different ways to implement the Nerode congruence using queries. 2. Next, building on the new formulation of L*-style algorithms we proceed to develop new algorithms for learning transducer models. Firstly, we present the first algorithm for learning deterministic partial transducers. Furthermore, we extend my algorithm into non-deterministic models by introducing a novel, generalized congruence relation over string transformations which is able to capture a subclass of string transformations with regular lookahead. We demonstrate that this class is able to capture many practical string transformation from the domain of string sanitizers in Web applications. 3. Classical learning algorithms for automata and transducers operate over finite alphabets and have a query complexity that scales linearly with the size of the alphabet. However, in practice, this dependence on the alphabet size hinders the performance of the algorithms. To address this issue, we develop the MAT* algorithm for learning symbolic finite state automata (SFAs) which operate over infinite alphabets. In practice, the MAT* learning algorithm allow us to plug custom transition learning algorithms which will efficiently infer the predicates in the transitions of the SFA without querying the whole alphabet set. 4. Finally, we use our learning algorithm toolbox as the basis for the development of a set of black-box testing algorithms. More specifically, we present Grammar Oriented Filter Auditing (GOFA), a novel technique which allows one to utilize my learning algorithms to evaluate the robustness of a string sanitizer or filter against a set of attack strings given as a context-free grammar. Furthermore, because such grammars are many times unavailable, we developed sfadiff a differential testing technique based on symbolic automata learning which can be used in order to perform differential testing of two different parser implementations using SFA learning algorithms and we demonstrate how our algorithm can be used to develop program fingerprints. We evaluate our algorithms against state-of-the-art Web Application Firewalls and discover over 15 previously unknown vulnerabilities which result in evading the firewalls and performing code injection attacks in the backend Web application. Finally, we show how our learning algorithms can uncover vulnerabilities which are missed by other black-box methods such as fuzzing and grammar-based testing

    Foundations of Software Science and Computation Structures

    Get PDF
    This open access book constitutes the proceedings of the 22nd International Conference on Foundations of Software Science and Computational Structures, FOSSACS 2019, which took place in Prague, Czech Republic, in April 2019, held as part of the European Joint Conference on Theory and Practice of Software, ETAPS 2019. The 29 papers presented in this volume were carefully reviewed and selected from 85 submissions. They deal with foundational research with a clear significance for software science
    corecore