6,787 research outputs found

    The Magic Number Problem for Subregular Language Families

    Full text link
    We investigate the magic number problem, that is, the question whether there exists a minimal n-state nondeterministic finite automaton (NFA) whose equivalent minimal deterministic finite automaton (DFA) has alpha states, for all n and alpha satisfying n less or equal to alpha less or equal to exp(2,n). A number alpha not satisfying this condition is called a magic number (for n). It was shown in [11] that no magic numbers exist for general regular languages, while in [5] trivial and non-trivial magic numbers for unary regular languages were identified. We obtain similar results for automata accepting subregular languages like, for example, combinational languages, star-free, prefix-, suffix-, and infix-closed languages, and prefix-, suffix-, and infix-free languages, showing that there are only trivial magic numbers, when they exist. For finite languages we obtain some partial results showing that certain numbers are non-magic.Comment: In Proceedings DCFS 2010, arXiv:1008.127

    Streaming Property Testing of Visibly Pushdown Languages

    Get PDF
    In the context of language recognition, we demonstrate the superiority of streaming property testers against streaming algorithms and property testers, when they are not combined. Initiated by Feigenbaum et al., a streaming property tester is a streaming algorithm recognizing a language under the property testing approximation: it must distinguish inputs of the language from those that are ε\varepsilon-far from it, while using the smallest possible memory (rather than limiting its number of input queries). Our main result is a streaming ε\varepsilon-property tester for visibly pushdown languages (VPL) with one-sided error using memory space poly((logn)/ε)\mathrm{poly}((\log n) / \varepsilon). This constructions relies on a (non-streaming) property tester for weighted regular languages based on a previous tester by Alon et al. We provide a simple application of this tester for streaming testing special cases of instances of VPL that are already hard for both streaming algorithms and property testers. Our main algorithm is a combination of an original simulation of visibly pushdown automata using a stack with small height but possible items of linear size. In a second step, those items are replaced by small sketches. Those sketches relies on a notion of suffix-sampling we introduce. This sampling is the key idea connecting our streaming tester algorithm to property testers.Comment: 23 pages. Major modifications in the presentatio

    Regular Languages meet Prefix Sorting

    Full text link
    Indexing strings via prefix (or suffix) sorting is, arguably, one of the most successful algorithmic techniques developed in the last decades. Can indexing be extended to languages? The main contribution of this paper is to initiate the study of the sub-class of regular languages accepted by an automaton whose states can be prefix-sorted. Starting from the recent notion of Wheeler graph [Gagie et al., TCS 2017]-which extends naturally the concept of prefix sorting to labeled graphs-we investigate the properties of Wheeler languages, that is, regular languages admitting an accepting Wheeler finite automaton. Interestingly, we characterize this family as the natural extension of regular languages endowed with the co-lexicographic ordering: when sorted, the strings belonging to a Wheeler language are partitioned into a finite number of co-lexicographic intervals, each formed by elements from a single Myhill-Nerode equivalence class. Moreover: (i) We show that every Wheeler NFA (WNFA) with nn states admits an equivalent Wheeler DFA (WDFA) with at most 2n1Σ2n-1-|\Sigma| states that can be computed in O(n3)O(n^3) time. This is in sharp contrast with general NFAs. (ii) We describe a quadratic algorithm to prefix-sort a proper superset of the WDFAs, a O(nlogn)O(n\log n)-time online algorithm to sort acyclic WDFAs, and an optimal linear-time offline algorithm to sort general WDFAs. By contribution (i), our algorithms can also be used to index any WNFA at the moderate price of doubling the automaton's size. (iii) We provide a minimization theorem that characterizes the smallest WDFA recognizing the same language of any input WDFA. The corresponding constructive algorithm runs in optimal linear time in the acyclic case, and in O(nlogn)O(n\log n) time in the general case. (iv) We show how to compute the smallest WDFA equivalent to any acyclic DFA in nearly-optimal time.Comment: added minimization theorems; uploaded submitted version; New version with new results (W-MH theorem, linear determinization), added author: Giovanna D'Agostin

    Complexity of Left-Ideal, Suffix-Closed and Suffix-Free Regular Languages

    Get PDF
    A language LL over an alphabet Σ\Sigma is suffix-convex if, for any words x,y,zΣx,y,z\in\Sigma^*, whenever zz and xyzxyz are in LL, then so is yzyz. Suffix-convex languages include three special cases: left-ideal, suffix-closed, and suffix-free languages. We examine complexity properties of these three special classes of suffix-convex regular languages. In particular, we study the quotient/state complexity of boolean operations, product (concatenation), star, and reversal on these languages, as well as the size of their syntactic semigroups, and the quotient complexity of their atoms.Comment: 20 pages, 11 figures, 1 table. arXiv admin note: text overlap with arXiv:1605.0669

    The middle as a voice category in Bantu : setting the stage for further research

    Get PDF
    The main goal of our paper is to give a first, general description of middle voice in Bantu. As will be shown, this language group has a set of verbal derivational morphemes that challenges some of the concepts related to the middle domain. First of all, as of yet no description has been found of a language having more than one middle marker, yet many Bantu languages have up to four or five derivational morphemes that cover several parts of the semantic domain of the middle. Secondly, provided that the polysemy patterns of these morphemes only partially cover what is generally considered the “canonical” middle domain, we will call these “quasi-middle” markers. The fact that these verbal morphemes also convey notions that are usually not considered to belong to the domain of the canonical middle calls for a reassessment of what constitutes the semantic core of this voice category cross-linguistically. Although the theoretical implications of these new data are not the central focus of our paper, the basic description that we aim to provide of the middle in Bantu can nevertheless contribute to further discussion on this intricate voice category

    Partially-commutative context-free languages

    Get PDF
    The paper is about a class of languages that extends context-free languages (CFL) and is stable under shuffle. Specifically, we investigate the class of partially-commutative context-free languages (PCCFL), where non-terminal symbols are commutative according to a binary independence relation, very much like in trace theory. The class has been recently proposed as a robust class subsuming CFL and commutative CFL. This paper surveys properties of PCCFL. We identify a natural corresponding automaton model: stateless multi-pushdown automata. We show stability of the class under natural operations, including homomorphic images and shuffle. Finally, we relate expressiveness of PCCFL to two other relevant classes: CFL extended with shuffle and trace-closures of CFL. Among technical contributions of the paper are pumping lemmas, as an elegant completion of known pumping properties of regular languages, CFL and commutative CFL.Comment: In Proceedings EXPRESS/SOS 2012, arXiv:1208.244

    Fast Label Extraction in the CDAWG

    Full text link
    The compact directed acyclic word graph (CDAWG) of a string TT of length nn takes space proportional just to the number ee of right extensions of the maximal repeats of TT, and it is thus an appealing index for highly repetitive datasets, like collections of genomes from similar species, in which ee grows significantly more slowly than nn. We reduce from O(mloglogn)O(m\log{\log{n}}) to O(m)O(m) the time needed to count the number of occurrences of a pattern of length mm, using an existing data structure that takes an amount of space proportional to the size of the CDAWG. This implies a reduction from O(mloglogn+occ)O(m\log{\log{n}}+\mathtt{occ}) to O(m+occ)O(m+\mathtt{occ}) in the time needed to locate all the occ\mathtt{occ} occurrences of the pattern. We also reduce from O(kloglogn)O(k\log{\log{n}}) to O(k)O(k) the time needed to read the kk characters of the label of an edge of the suffix tree of TT, and we reduce from O(mloglogn)O(m\log{\log{n}}) to O(m)O(m) the time needed to compute the matching statistics between a query of length mm and TT, using an existing representation of the suffix tree based on the CDAWG. All such improvements derive from extracting the label of a vertex or of an arc of the CDAWG using a straight-line program induced by the reversed CDAWG.Comment: 16 pages, 1 figure. In proceedings of the 24th International Symposium on String Processing and Information Retrieval (SPIRE 2017). arXiv admin note: text overlap with arXiv:1705.0864
    corecore