6,787 research outputs found
The Magic Number Problem for Subregular Language Families
We investigate the magic number problem, that is, the question whether there
exists a minimal n-state nondeterministic finite automaton (NFA) whose
equivalent minimal deterministic finite automaton (DFA) has alpha states, for
all n and alpha satisfying n less or equal to alpha less or equal to exp(2,n).
A number alpha not satisfying this condition is called a magic number (for n).
It was shown in [11] that no magic numbers exist for general regular languages,
while in [5] trivial and non-trivial magic numbers for unary regular languages
were identified. We obtain similar results for automata accepting subregular
languages like, for example, combinational languages, star-free, prefix-,
suffix-, and infix-closed languages, and prefix-, suffix-, and infix-free
languages, showing that there are only trivial magic numbers, when they exist.
For finite languages we obtain some partial results showing that certain
numbers are non-magic.Comment: In Proceedings DCFS 2010, arXiv:1008.127
Streaming Property Testing of Visibly Pushdown Languages
In the context of language recognition, we demonstrate the superiority of
streaming property testers against streaming algorithms and property testers,
when they are not combined. Initiated by Feigenbaum et al., a streaming
property tester is a streaming algorithm recognizing a language under the
property testing approximation: it must distinguish inputs of the language from
those that are -far from it, while using the smallest possible
memory (rather than limiting its number of input queries).
Our main result is a streaming -property tester for visibly
pushdown languages (VPL) with one-sided error using memory space
.
This constructions relies on a (non-streaming) property tester for weighted
regular languages based on a previous tester by Alon et al. We provide a simple
application of this tester for streaming testing special cases of instances of
VPL that are already hard for both streaming algorithms and property testers.
Our main algorithm is a combination of an original simulation of visibly
pushdown automata using a stack with small height but possible items of linear
size. In a second step, those items are replaced by small sketches. Those
sketches relies on a notion of suffix-sampling we introduce. This sampling is
the key idea connecting our streaming tester algorithm to property testers.Comment: 23 pages. Major modifications in the presentatio
Regular Languages meet Prefix Sorting
Indexing strings via prefix (or suffix) sorting is, arguably, one of the most
successful algorithmic techniques developed in the last decades. Can indexing
be extended to languages? The main contribution of this paper is to initiate
the study of the sub-class of regular languages accepted by an automaton whose
states can be prefix-sorted. Starting from the recent notion of Wheeler graph
[Gagie et al., TCS 2017]-which extends naturally the concept of prefix sorting
to labeled graphs-we investigate the properties of Wheeler languages, that is,
regular languages admitting an accepting Wheeler finite automaton.
Interestingly, we characterize this family as the natural extension of regular
languages endowed with the co-lexicographic ordering: when sorted, the strings
belonging to a Wheeler language are partitioned into a finite number of
co-lexicographic intervals, each formed by elements from a single Myhill-Nerode
equivalence class. Moreover: (i) We show that every Wheeler NFA (WNFA) with
states admits an equivalent Wheeler DFA (WDFA) with at most
states that can be computed in time. This is in sharp contrast with
general NFAs. (ii) We describe a quadratic algorithm to prefix-sort a proper
superset of the WDFAs, a -time online algorithm to sort acyclic
WDFAs, and an optimal linear-time offline algorithm to sort general WDFAs. By
contribution (i), our algorithms can also be used to index any WNFA at the
moderate price of doubling the automaton's size. (iii) We provide a
minimization theorem that characterizes the smallest WDFA recognizing the same
language of any input WDFA. The corresponding constructive algorithm runs in
optimal linear time in the acyclic case, and in time in the
general case. (iv) We show how to compute the smallest WDFA equivalent to any
acyclic DFA in nearly-optimal time.Comment: added minimization theorems; uploaded submitted version; New version
with new results (W-MH theorem, linear determinization), added author:
Giovanna D'Agostin
Complexity of Left-Ideal, Suffix-Closed and Suffix-Free Regular Languages
A language over an alphabet is suffix-convex if, for any words
, whenever and are in , then so is .
Suffix-convex languages include three special cases: left-ideal, suffix-closed,
and suffix-free languages. We examine complexity properties of these three
special classes of suffix-convex regular languages. In particular, we study the
quotient/state complexity of boolean operations, product (concatenation), star,
and reversal on these languages, as well as the size of their syntactic
semigroups, and the quotient complexity of their atoms.Comment: 20 pages, 11 figures, 1 table. arXiv admin note: text overlap with
arXiv:1605.0669
The middle as a voice category in Bantu : setting the stage for further research
The main goal of our paper is to give a first, general description of middle voice in Bantu. As will be shown, this language group has a set of verbal derivational morphemes that challenges some of the concepts related to the middle domain. First of all, as of yet no description has been found of a language having more than one middle marker, yet many Bantu languages have up to four or five derivational morphemes that cover several parts of the semantic domain of the middle. Secondly, provided that the polysemy patterns of these morphemes only partially cover what is generally considered the “canonical” middle domain, we will call these “quasi-middle” markers. The fact that these verbal morphemes also convey notions that are usually not considered to belong to the domain of the canonical middle calls for a reassessment of what constitutes the semantic core of this voice category cross-linguistically. Although the theoretical implications of these new data are not the central focus of our paper, the basic description that we aim to provide of the middle in Bantu can nevertheless contribute to further discussion on this intricate voice category
Partially-commutative context-free languages
The paper is about a class of languages that extends context-free languages
(CFL) and is stable under shuffle. Specifically, we investigate the class of
partially-commutative context-free languages (PCCFL), where non-terminal
symbols are commutative according to a binary independence relation, very much
like in trace theory. The class has been recently proposed as a robust class
subsuming CFL and commutative CFL. This paper surveys properties of PCCFL. We
identify a natural corresponding automaton model: stateless multi-pushdown
automata. We show stability of the class under natural operations, including
homomorphic images and shuffle. Finally, we relate expressiveness of PCCFL to
two other relevant classes: CFL extended with shuffle and trace-closures of
CFL. Among technical contributions of the paper are pumping lemmas, as an
elegant completion of known pumping properties of regular languages, CFL and
commutative CFL.Comment: In Proceedings EXPRESS/SOS 2012, arXiv:1208.244
Fast Label Extraction in the CDAWG
The compact directed acyclic word graph (CDAWG) of a string of length
takes space proportional just to the number of right extensions of the
maximal repeats of , and it is thus an appealing index for highly repetitive
datasets, like collections of genomes from similar species, in which grows
significantly more slowly than . We reduce from to
the time needed to count the number of occurrences of a pattern of
length , using an existing data structure that takes an amount of space
proportional to the size of the CDAWG. This implies a reduction from
to in the time needed to
locate all the occurrences of the pattern. We also reduce from
to the time needed to read the characters of the
label of an edge of the suffix tree of , and we reduce from
to the time needed to compute the matching
statistics between a query of length and , using an existing
representation of the suffix tree based on the CDAWG. All such improvements
derive from extracting the label of a vertex or of an arc of the CDAWG using a
straight-line program induced by the reversed CDAWG.Comment: 16 pages, 1 figure. In proceedings of the 24th International
Symposium on String Processing and Information Retrieval (SPIRE 2017). arXiv
admin note: text overlap with arXiv:1705.0864
- …