238 research outputs found

    Stream Processing using Grammars and Regular Expressions

    Full text link
    In this dissertation we study regular expression based parsing and the use of grammatical specifications for the synthesis of fast, streaming string-processing programs. In the first part we develop two linear-time algorithms for regular expression based parsing with Perl-style greedy disambiguation. The first algorithm operates in two passes in a semi-streaming fashion, using a constant amount of working memory and an auxiliary tape storage which is written in the first pass and consumed by the second. The second algorithm is a single-pass and optimally streaming algorithm which outputs as much of the parse tree as is semantically possible based on the input prefix read so far, and resorts to buffering as many symbols as is required to resolve the next choice. Optimality is obtained by performing a PSPACE-complete pre-analysis on the regular expression. In the second part we present Kleenex, a language for expressing high-performance streaming string processing programs as regular grammars with embedded semantic actions, and its compilation to streaming string transducers with worst-case linear-time performance. Its underlying theory is based on transducer decomposition into oracle and action machines, and a finite-state specialization of the streaming parsing algorithm presented in the first part. In the second part we also develop a new linear-time streaming parsing algorithm for parsing expression grammars (PEG) which generalizes the regular grammars of Kleenex. The algorithm is based on a bottom-up tabulation algorithm reformulated using least fixed points and evaluated using an instance of the chaotic iteration scheme by Cousot and Cousot

    Optimally Streaming Greedy Regular Expression Parsing

    Get PDF
    Abstract. We study the problem of streaming regular expression parsing: Given a regular expression and an input stream of symbols, how to output a serialized syntax tree representation as an output stream during input stream processing. We show that optimally streaming regular expression parsing, outputting bits of the output as early as is semantically possible for any regular expression of size m and any input string of length n, can be performed in time O(2 m log m + mn) on a unit-cost random-access machine. This is for the wide-spread greedy disambiguation strategy for choosing parse trees of grammatically ambiguous regular expressions. In particular, for a fixed regular expression, the algorithm's run-time scales linearly with the input string length. The exponential is due to the need for preprocessing the regular expression to analyze state coverage of its associated NFA, a PSPACE-hard problem, and tabulating all reachable ordered sets of NFA-states. Previous regular expression parsing algorithms operate in multiple phases, always requiring processing or storing the whole input string before outputting the first bit of output, not only for those regular expressions and input prefixes where reading to the end of the input is strictly necessary

    Proceedings

    Get PDF
    Proceedings of the NODALIDA 2011 Workshop Constraint Grammar Applications. Editors: Eckhard Bick, Kristin Hagen, Kaili Müürisep, Trond Trosterud. NEALT Proceedings Series, Vol. 14 (2011), vi+69 pp. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/19231

    POSIX Lexing with Bitcoded Derivatives

    Get PDF

    Efficient Automata Techniques and Their Applications

    Get PDF
    Tato práce se zabývá vývojem efektivních technik pro konečné automaty a jejich aplikace. Zejména se věnujeme konečným automatům použitých pří detekci útoků v síťovém provozu a automatům v rozhodovacích procedurách a verifikaci. V první části práce navrhujeme techniky přibližné redukce nedeterministických automatů, které snižují spotřebu zdrojů v hardwarově akcelerovaném zkoumání obsahu paketů. Druhá část práce je je věnována automatům v rozhodovacích procedurách, zejména slabé monadické logice druhého řádů k následníků (WSkS) a teorie nad řetězci. Navrhujeme novou rozhodovací proceduru pro WS2S založenou na automatových termech, umožňující efektivně prořezávat stavový prostor. Dále studujeme techniky předzpracování WSkS formulí za účelem snížení velikosti konstruovaných automatů. Automaty jsme také aplikovali v rozhodovací proceduře teorie nad řetězci pro efektivní reprezentaci důkazového stromu. V poslední části práce potom navrhujeme optimalizace rank-based komplementace Buchiho automatů, které snižuje počet generovaných stavů během konstrukce komplementu.This thesis develops efficient techniques for finite automata and their applications. In particular, we focus on finite automata in network intrusion detection and automata in decision procedures and verification. In the first part of the thesis, we propose techniques of approximate reduction of nondeterministic automata decreasing consumption of resources of hardware-accelerated deep packet inspection. The second part is devoted to automata in decision procedures, in particular, to weak monadic second-order logic of k successors (WSkS) and the theory of strings. We propose a novel decision procedure for WS2S based on automata terms allowing one to effectively prune the state space. Further, we study techniques of WSkS formulae preprocessing intended to reduce the sizes of constructed intermediate automata. Moreover, we employ automata in a decision procedure of the theory of strings for efficient handling of the proof graph. The last part of the thesis then proposes optimizations in rank-based Buchi automata complementation reducing the number of generated states during the construction.

    Acta Cybernetica : Volume 20. Number 3.

    Get PDF
    corecore