Search CORE

6,787 research outputs found

The Magic Number Problem for Subregular Language Families

Author: A. Matsuura
A. R. Meyer
F. R. Moore
G. Jirásková
G. Jirásková
G. Jirásková
G. Jirásková
Giovanni Pighizzini
H. Bordihn
Huei-Jan Shyr
Ian McQuillan
J. A. Brzozowski
J. A. Brzozowski
Jozef Jirásek
Jui-Yi Kao
K. Iwama
K. Iwama
K. Salomaa
M. Holzer
M. Holzer
M. Holzer
M. O. Rabin
Markus Holzer
Martin Kutrib
O. B. Lupanov
R. Mandl
R. McNaughton
Sebastian Jakobi
V. Geffert
V. Geffert
Publication venue: 'Open Publishing Association'
Publication date: 01/08/2010
Field of study

We investigate the magic number problem, that is, the question whether there exists a minimal n-state nondeterministic finite automaton (NFA) whose equivalent minimal deterministic finite automaton (DFA) has alpha states, for all n and alpha satisfying n less or equal to alpha less or equal to exp(2,n). A number alpha not satisfying this condition is called a magic number (for n). It was shown in [11] that no magic numbers exist for general regular languages, while in [5] trivial and non-trivial magic numbers for unary regular languages were identified. We obtain similar results for automata accepting subregular languages like, for example, combinational languages, star-free, prefix-, suffix-, and infix-closed languages, and prefix-, suffix-, and infix-free languages, showing that there are only trivial magic numbers, when they exist. For finite languages we obtain some partial results showing that certain numbers are non-magic.Comment: In Proceedings DCFS 2010, arXiv:1008.127

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Streaming Property Testing of Visibly Pushdown Languages

Author: de Rougemont Michel
François Nathanaël
Magniez Frédéric
Serre Olivier
Publication venue
Publication date: 03/11/2015
Field of study

In the context of language recognition, we demonstrate the superiority of streaming property testers against streaming algorithms and property testers, when they are not combined. Initiated by Feigenbaum et al., a streaming property tester is a streaming algorithm recognizing a language under the property testing approximation: it must distinguish inputs of the language from those that are

\varepsilon

-far from it, while using the smallest possible memory (rather than limiting its number of input queries). Our main result is a streaming

\varepsilon

-property tester for visibly pushdown languages (VPL) with one-sided error using memory space

\mathrm{poly}((\log n) / \varepsilon)

. This constructions relies on a (non-streaming) property tester for weighted regular languages based on a previous tester by Alon et al. We provide a simple application of this tester for streaming testing special cases of instances of VPL that are already hard for both streaming algorithms and property testers. Our main algorithm is a combination of an original simulation of visibly pushdown automata using a stack with small height but possible items of linear size. In a second step, those items are replaced by small sketches. Those sketches relies on a notion of suffix-sampling we introduce. This sampling is the key idea connecting our streaming tester algorithm to property testers.Comment: 23 pages. Major modifications in the presentatio

arXiv.org e-Print Archive

HAL Descartes

Hal-Diderot

Regular Languages meet Prefix Sorting

Author: Alanko Jarno
D'Agostino Giovanna
Policriti Alberto
Prezza Nicola
Publication venue
Publication date: 09/07/2019
Field of study

Indexing strings via prefix (or suffix) sorting is, arguably, one of the most successful algorithmic techniques developed in the last decades. Can indexing be extended to languages? The main contribution of this paper is to initiate the study of the sub-class of regular languages accepted by an automaton whose states can be prefix-sorted. Starting from the recent notion of Wheeler graph [Gagie et al., TCS 2017]-which extends naturally the concept of prefix sorting to labeled graphs-we investigate the properties of Wheeler languages, that is, regular languages admitting an accepting Wheeler finite automaton. Interestingly, we characterize this family as the natural extension of regular languages endowed with the co-lexicographic ordering: when sorted, the strings belonging to a Wheeler language are partitioned into a finite number of co-lexicographic intervals, each formed by elements from a single Myhill-Nerode equivalence class. Moreover: (i) We show that every Wheeler NFA (WNFA) with

n

states admits an equivalent Wheeler DFA (WDFA) with at most

2n-1-|\Sigma|

states that can be computed in

O(n^3)

time. This is in sharp contrast with general NFAs. (ii) We describe a quadratic algorithm to prefix-sort a proper superset of the WDFAs, a

O(n\log n)

-time online algorithm to sort acyclic WDFAs, and an optimal linear-time offline algorithm to sort general WDFAs. By contribution (i), our algorithms can also be used to index any WNFA at the moderate price of doubling the automaton's size. (iii) We provide a minimization theorem that characterizes the smallest WDFA recognizing the same language of any input WDFA. The corresponding constructive algorithm runs in optimal linear time in the acyclic case, and in

O(n\log n)

time in the general case. (iv) We show how to compute the smallest WDFA equivalent to any acyclic DFA in nearly-optimal time.Comment: added minimization theorems; uploaded submitted version; New version with new results (W-MH theorem, linear determinization), added author: Giovanna D'Agostin

arXiv.org e-Print Archive

Crossref

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma

Complexity of Left-Ideal, Suffix-Closed and Suffix-Free Regular Languages

Author: B Krawetz
J Berstel
J Brzozowski
J Brzozowski
J Brzozowski
JA Brzozowski
JA Brzozowski
JA Brzozowski
JA Brzozowski
JA Brzozowski
JA Brzozowski
JA Brzozowski
JA Brzozowski
JE Pin
M Holzer
R Cmorik
S Iván
S Yu
T Ang
YS Han
Publication venue
Publication date: 03/10/2016
Field of study

A language

L

over an alphabet

\Sigma

is suffix-convex if, for any words

x,y,z\in\Sigma^*

, whenever

z

and

xyz

are in

L

, then so is

yz

. Suffix-convex languages include three special cases: left-ideal, suffix-closed, and suffix-free languages. We examine complexity properties of these three special classes of suffix-convex regular languages. In particular, we study the quotient/state complexity of boolean operations, product (concatenation), star, and reversal on these languages, as well as the size of their syntactic semigroups, and the quotient complexity of their atoms.Comment: 20 pages, 11 figures, 1 table. arXiv admin note: text overlap with arXiv:1605.0669

arXiv.org e-Print Archive

Crossref

University of Waterloo's Institutional Repository

The middle as a voice category in Bantu : setting the stage for further research

Author: Bostoen Koen
Dom Sebastian
Kulikov Leonid
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2018
Field of study

The main goal of our paper is to give a first, general description of middle voice in Bantu. As will be shown, this language group has a set of verbal derivational morphemes that challenges some of the concepts related to the middle domain. First of all, as of yet no description has been found of a language having more than one middle marker, yet many Bantu languages have up to four or five derivational morphemes that cover several parts of the semantic domain of the middle. Secondly, provided that the polysemy patterns of these morphemes only partially cover what is generally considered the “canonical” middle domain, we will call these “quasi-middle” markers. The fact that these verbal morphemes also convey notions that are usually not considered to belong to the domain of the canonical middle calls for a reassessment of what constitutes the semantic core of this voice category cross-linguistically. Although the theoretical implications of these new data are not the central focus of our paper, the basic description that we aim to provide of the middle in Bantu can nevertheless contribute to further discussion on this intricate voice category

Ghent University Academic Bibliography

Archivsystem Ask23

DIAL UCLouvain

Partially-commutative context-free languages

Author: Bas Luttik
Bergstra
Bergstra
Berstel
Bouajjani
Christensen
Czerwiński
Czerwiński
Esparza
Gischer
Hirshfeld
Mayr
Mazurkiewicz
Michel A. Reniers
Nederhof
Srba
Sławomir Lasota
Wojciech Czerwiński
Publication venue: 'Open Publishing Association'
Publication date: 01/08/2012
Field of study

The paper is about a class of languages that extends context-free languages (CFL) and is stable under shuffle. Specifically, we investigate the class of partially-commutative context-free languages (PCCFL), where non-terminal symbols are commutative according to a binary independence relation, very much like in trace theory. The class has been recently proposed as a robust class subsuming CFL and commutative CFL. This paper surveys properties of PCCFL. We identify a natural corresponding automaton model: stateless multi-pushdown automata. We show stability of the class under natural operations, including homomorphic images and shuffle. Finally, we relate expressiveness of PCCFL to two other relevant classes: CFL extended with shuffle and trace-closures of CFL. Among technical contributions of the paper are pumping lemmas, as an elegant completion of known pumping properties of regular languages, CFL and commutative CFL.Comment: In Proceedings EXPRESS/SOS 2012, arXiv:1208.244

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Fast Label Extraction in the CDAWG

Author: A Blumer
D Belazzougui
D Gusfield
J Sirén
L Gasieniec
LS Russo
M Crochemore
M Crochemore
M Crochemore
M Crochemore
M Raffinot
MA Bender
O Berkman
T Gagie
V Mäkinen
V Mäkinen
Publication venue
Publication date: 26/09/2017
Field of study

The compact directed acyclic word graph (CDAWG) of a string

T

of length

n

takes space proportional just to the number

e

of right extensions of the maximal repeats of

T

, and it is thus an appealing index for highly repetitive datasets, like collections of genomes from similar species, in which

e

grows significantly more slowly than

n

. We reduce from

O(m\log{\log{n}})

O(m)

the time needed to count the number of occurrences of a pattern of length

m

, using an existing data structure that takes an amount of space proportional to the size of the CDAWG. This implies a reduction from

O(m\log{\log{n}}+\mathtt{occ})

O(m+\mathtt{occ})

in the time needed to locate all the

\mathtt{occ}

occurrences of the pattern. We also reduce from

O(k\log{\log{n}})

O(k)

the time needed to read the

k

characters of the label of an edge of the suffix tree of

T

, and we reduce from

O(m\log{\log{n}})

O(m)

the time needed to compute the matching statistics between a query of length

m

and

T

, using an existing representation of the suffix tree based on the CDAWG. All such improvements derive from extracting the label of a vertex or of an arc of the CDAWG using a straight-line program induced by the reversed CDAWG.Comment: 16 pages, 1 figure. In proceedings of the 24th International Symposium on String Processing and Information Retrieval (SPIRE 2017). arXiv admin note: text overlap with arXiv:1705.0864

arXiv.org e-Print Archive

Crossref