    Complexity of Left-Ideal, Suffix-Closed and Suffix-Free Regular Languages

    A language LL over an alphabet Σ\Sigma is suffix-convex if, for any words x,y,zΣx,y,z\in\Sigma^*, whenever zz and xyzxyz are in LL, then so is yzyz. Suffix-convex languages include three special cases: left-ideal, suffix-closed, and suffix-free languages. We examine complexity properties of these three special classes of suffix-convex regular languages. In particular, we study the quotient/state complexity of boolean operations, product (concatenation), star, and reversal on these languages, as well as the size of their syntactic semigroups, and the quotient complexity of their atoms.Comment: 20 pages, 11 figures, 1 table. arXiv admin note: text overlap with arXiv:1605.0669

    Quotient Complexity of Regular Languages

    The past research on the state complexity of operations on regular languages is examined, and a new approach based on an old method (derivatives of regular expressions) is presented. Since state complexity is a property of a language, it is appropriate to define it in formal-language terms as the number of distinct quotients of the language, and to call it "quotient complexity". The problem of finding the quotient complexity of a language f(K,L) is considered, where K and L are regular languages and f is a regular operation, for example, union or concatenation. Since quotients can be represented by derivatives, one can find a formula for the typical quotient of f(K,L) in terms of the quotients of K and L. To obtain an upper bound on the number of quotients of f(K,L) all one has to do is count how many such quotients are possible, and this makes automaton constructions unnecessary. The advantages of this point of view are illustrated by many examples. Moreover, new general observations are presented to help in the estimation of the upper bounds on quotient complexity of regular operations

    The Magic Number Problem for Subregular Language Families

    We investigate the magic number problem, that is, the question whether there exists a minimal n-state nondeterministic finite automaton (NFA) whose equivalent minimal deterministic finite automaton (DFA) has alpha states, for all n and alpha satisfying n less or equal to alpha less or equal to exp(2,n). A number alpha not satisfying this condition is called a magic number (for n). It was shown in [11] that no magic numbers exist for general regular languages, while in [5] trivial and non-trivial magic numbers for unary regular languages were identified. We obtain similar results for automata accepting subregular languages like, for example, combinational languages, star-free, prefix-, suffix-, and infix-closed languages, and prefix-, suffix-, and infix-free languages, showing that there are only trivial magic numbers, when they exist. For finite languages we obtain some partial results showing that certain numbers are non-magic.Comment: In Proceedings DCFS 2010, arXiv:1008.127

    Streaming Property Testing of Visibly Pushdown Languages

    In the context of language recognition, we demonstrate the superiority of streaming property testers against streaming algorithms and property testers, when they are not combined. Initiated by Feigenbaum et al., a streaming property tester is a streaming algorithm recognizing a language under the property testing approximation: it must distinguish inputs of the language from those that are ε\varepsilon-far from it, while using the smallest possible memory (rather than limiting its number of input queries). Our main result is a streaming ε\varepsilon-property tester for visibly pushdown languages (VPL) with one-sided error using memory space poly((logn)/ε)\mathrm{poly}((\log n) / \varepsilon). This constructions relies on a (non-streaming) property tester for weighted regular languages based on a previous tester by Alon et al. We provide a simple application of this tester for streaming testing special cases of instances of VPL that are already hard for both streaming algorithms and property testers. Our main algorithm is a combination of an original simulation of visibly pushdown automata using a stack with small height but possible items of linear size. In a second step, those items are replaced by small sketches. Those sketches relies on a notion of suffix-sampling we introduce. This sampling is the key idea connecting our streaming tester algorithm to property testers.Comment: 23 pages. Major modifications in the presentatio

    Regular Languages meet Prefix Sorting

    Indexing strings via prefix (or suffix) sorting is, arguably, one of the most successful algorithmic techniques developed in the last decades. Can indexing be extended to languages? The main contribution of this paper is to initiate the study of the sub-class of regular languages accepted by an automaton whose states can be prefix-sorted. Starting from the recent notion of Wheeler graph [Gagie et al., TCS 2017]-which extends naturally the concept of prefix sorting to labeled graphs-we investigate the properties of Wheeler languages, that is, regular languages admitting an accepting Wheeler finite automaton. Interestingly, we characterize this family as the natural extension of regular languages endowed with the co-lexicographic ordering: when sorted, the strings belonging to a Wheeler language are partitioned into a finite number of co-lexicographic intervals, each formed by elements from a single Myhill-Nerode equivalence class. Moreover: (i) We show that every Wheeler NFA (WNFA) with nn states admits an equivalent Wheeler DFA (WDFA) with at most 2n1Σ2n-1-|\Sigma| states that can be computed in O(n3)O(n^3) time. This is in sharp contrast with general NFAs. (ii) We describe a quadratic algorithm to prefix-sort a proper superset of the WDFAs, a O(nlogn)O(n\log n)-time online algorithm to sort acyclic WDFAs, and an optimal linear-time offline algorithm to sort general WDFAs. By contribution (i), our algorithms can also be used to index any WNFA at the moderate price of doubling the automaton's size. (iii) We provide a minimization theorem that characterizes the smallest WDFA recognizing the same language of any input WDFA. The corresponding constructive algorithm runs in optimal linear time in the acyclic case, and in O(nlogn)O(n\log n) time in the general case. (iv) We show how to compute the smallest WDFA equivalent to any acyclic DFA in nearly-optimal time.Comment: added minimization theorems; uploaded submitted version; New version with new results (W-MH theorem, linear determinization), added author: Giovanna D'Agostin

    State complexity of catenation combined with a boolean operation: a unified approach

    In this paper we study the state complexity of catenation combined with symmetric difference. First, an upper bound is computed using some combinatoric tools. Then, this bound is shown to be tight by giving a witness for it. Moreover, we relate this work with the study of state complexity for two other combinations: catenation with union and catenation with intersection. And we extract a unified approach which allows to obtain the state complexity of any combination involving catenation and a binary boolean operation

    Covering and separation for logical fragments with modular predicates

    For every class C\mathscr{C} of word languages, one may associate a decision problem called C\mathscr{C}-separation. Given two regular languages, it asks whether there exists a third language in C\mathscr{C} containing the first language, while being disjoint from the second one. Usually, finding an algorithm deciding C\mathscr{C}-separation yields a deep insight on C\mathscr{C}. We consider classes defined by fragments of first-order logic. Given such a fragment, one may often build a larger class by adding more predicates to its signature. In the paper, we investigate the operation of enriching signatures with modular predicates. Our main theorem is a generic transfer result for this construction. Informally, we show that when a logical fragment is equipped with a signature containing the successor predicate, separation for the stronger logic enriched with modular predicates reduces to separation for the original logic. This result actually applies to a more general decision problem, called the covering problem

    Some Single and Combined Operations on Formal Languages: Algebraic Properties and Complexity

    In this thesis, we consider several research questions related to language operations in the following areas of automata and formal language theory: reversibility of operations, generalizations of (comma-free) codes, generalizations of basic operations, language equations, and state complexity. Motivated by cryptography applications, we investigate several reversibility questions with respect to the parallel insertion and deletion operations. Among the results we obtained, the following result is of particular interest. For languages L1, L2 ⊆ Σ∗, if L2 satisfies the condition L2ΣL2 ∩ Σ+L2Σ+ = ∅, then any language L1 can be recovered after first parallel-inserting L2 into L1 and then parallel-deleting L2 from the result. This property reminds us of the definition of comma-free codes. Following this observation, we define the notions of comma codes and k-comma codes, and then generalize them to comma intercodes and k-comma intercodes, respectively. Besides proving all these new codes are indeed codes, we obtain some interesting properties, as well as several hierarchical results among the families of the new codes and some existing codes such as comma-free codes, infix codes, and bifix codes. Another topic considered in this thesis are some natural generalizations of basic language operations. We introduce block insertion on trajectories and block deletion on trajectories, which properly generalize several sequential as well as parallel binary language operations such as catenation, sequential insertion, k-insertion, parallel insertion, quotient, sequential deletion, k-deletion, etc. We obtain several closure properties of the families of regular and context-free languages under the new operations by using some relationships between these new operations and shuffle and deletion on trajectories. Also, we obtain several decidability results of language equation problems with respect to the new operations. Lastly, we study the state complexity of the following combined operations: L1L2∗, L1L2R, L1(L2 ∩ L3), L1(L2 ∪ L3), (L1L2)R, L1∗L2, L1RL2, (L1 ∩ L2)L3, (L1 ∪ L2)L3, L1L2 ∩ L3, and L1L2 ∪ L3 for regular languages L1, L2, and L3. These are all the combinations of two basic operations whose state complexities have not been studied in the literature