520 research outputs found

    From Finite Automata to Regular Expressions and Back--A Summary on Descriptional Complexity

    Full text link
    The equivalence of finite automata and regular expressions dates back to the seminal paper of Kleene on events in nerve nets and finite automata from 1956. In the present paper we tour a fragment of the literature and summarize results on upper and lower bounds on the conversion of finite automata to regular expressions and vice versa. We also briefly recall the known bounds for the removal of spontaneous transitions (epsilon-transitions) on non-epsilon-free nondeterministic devices. Moreover, we report on recent results on the average case descriptional complexity bounds for the conversion of regular expressions to finite automata and brand new developments on the state elimination algorithm that converts finite automata to regular expressions.Comment: In Proceedings AFL 2014, arXiv:1405.527

    Joining Extractions of Regular Expressions

    Get PDF
    Regular expressions with capture variables, also known as "regex formulas," extract relations of spans (interval positions) from text. These relations can be further manipulated via Relational Algebra as studied in the context of document spanners, Fagin et al.'s formal framework for information extraction. We investigate the complexity of querying text by Conjunctive Queries (CQs) and Unions of CQs (UCQs) on top of regex formulas. We show that the lower bounds (NP-completeness and W[1]-hardness) from the relational world also hold in our setting; in particular, hardness hits already single-character text! Yet, the upper bounds from the relational world do not carry over. Unlike the relational world, acyclic CQs, and even gamma-acyclic CQs, are hard to compute. The source of hardness is that it may be intractable to instantiate the relation defined by a regex formula, simply because it has an exponential number of tuples. Yet, we are able to establish general upper bounds. In particular, UCQs can be evaluated with polynomial delay, provided that every CQ has a bounded number of atoms (while unions and projection can be arbitrary). Furthermore, UCQ evaluation is solvable with FPT (Fixed-Parameter Tractable) delay when the parameter is the size of the UCQ

    Verifying proofs in constant depth

    Get PDF
    In this paper we initiate the study of proof systems where verification of proofs proceeds by NC circuits. We investigate the question which languages admit proof systems in this very restricted model. Formulated alternatively, we ask which languages can be enumerated by NC functions. Our results show that the answer to this problem is not determined by the complexity of the language. On the one hand, we construct NC proof systems for a variety of languages ranging from regular to NP-complete. On the other hand, we show by combinatorial methods that even easy regular languages such as Exact-OR do not admit NC proof systems. We also present a general construction of proof systems for regular languages with strongly connected NFA's

    Optimising Unicode Regular Expression Evaluation with Previews

    Get PDF
    The jsre regular expression library was designed to provide fast matching of complex expressions over large input streams using user-selectable character encodings. An established design approach was used: a simulated non-deterministic automaton (NFA) implemented as a virtual machine, avoiding exponential cost functions in either space or time. A deterministic automaton (DFA) was chosen as a general dispatching mechanism for Unicode character classes and this also provided the opportunity to use compact DFAs in various optimization strategies. The result was the development of a regular expression Preview which provides a summary of all the matches possible from a given point in a regular expression in a form that can be implemented as a compact DFA and can be used to further improve the performance of the standard NFA simulation algorithm. This paper formally defines a preview and describes and evaluates several optimizations using this construct. They provide significant speed improvements accrued from fast scanning of anchor positions, avoiding retesting of repeated strings in unanchored searches, and efficient searching of multiple alternate expressions which in the case of keyword searching has a time complexity which is logarithmic in the number of words to be searched

    Parametric Linear Dynamic Logic

    Get PDF
    We introduce Parametric Linear Dynamic Logic (PLDL), which extends Linear Dynamic Logic (LDL) by temporal operators equipped with parameters that bound their scope. LDL was proposed as an extension of Linear Temporal Logic (LTL) that is able to express all ω\omega-regular specifications while still maintaining many of LTL's desirable properties like an intuitive syntax and a translation into non-deterministic B\"uchi automata of exponential size. But LDL lacks capabilities to express timing constraints. By adding parameterized operators to LDL, we obtain a logic that is able to express all ω\omega-regular properties and that subsumes parameterized extensions of LTL like Parametric LTL and PROMPT-LTL. Our main technical contribution is a translation of PLDL formulas into non-deterministic B\"uchi word automata of exponential size via alternating automata. This yields a PSPACE model checking algorithm and a realizability algorithm with doubly-exponential running time. Furthermore, we give tight upper and lower bounds on optimal parameter values for both problems. These results show that PLDL model checking and realizability are not harder than LTL model checking and realizability.Comment: In Proceedings GandALF 2014, arXiv:1408.556

    Stream Processing using Grammars and Regular Expressions

    Full text link
    In this dissertation we study regular expression based parsing and the use of grammatical specifications for the synthesis of fast, streaming string-processing programs. In the first part we develop two linear-time algorithms for regular expression based parsing with Perl-style greedy disambiguation. The first algorithm operates in two passes in a semi-streaming fashion, using a constant amount of working memory and an auxiliary tape storage which is written in the first pass and consumed by the second. The second algorithm is a single-pass and optimally streaming algorithm which outputs as much of the parse tree as is semantically possible based on the input prefix read so far, and resorts to buffering as many symbols as is required to resolve the next choice. Optimality is obtained by performing a PSPACE-complete pre-analysis on the regular expression. In the second part we present Kleenex, a language for expressing high-performance streaming string processing programs as regular grammars with embedded semantic actions, and its compilation to streaming string transducers with worst-case linear-time performance. Its underlying theory is based on transducer decomposition into oracle and action machines, and a finite-state specialization of the streaming parsing algorithm presented in the first part. In the second part we also develop a new linear-time streaming parsing algorithm for parsing expression grammars (PEG) which generalizes the regular grammars of Kleenex. The algorithm is based on a bottom-up tabulation algorithm reformulated using least fixed points and evaluated using an instance of the chaotic iteration scheme by Cousot and Cousot
    • …
    corecore