43 research outputs found
Discontinuities in pattern inference
This paper deals with the inferrability of classes of E-pattern languagesâalso referred
to as extended or erasing pattern languagesâfrom positive data in Goldâs
model of identification in the limit. The first main part of the paper shows that
the recently presented negative result on terminal-free E-pattern languages over binary
alphabets does not hold for other alphabet sizes, so that the full class of these
languages is inferrable from positive data if and only if the corresponding terminal
alphabet does not consist of exactly two distinct letters. The second main part yields
the insight that the positive result on terminal-free E-pattern languages over alphabets
with three or four letters cannot be extended to the class of general E-pattern
languages. With regard to larger alphabets, the extensibility remains open.
The proof methods developed for these main results do not directly discuss the
(non-)existence of appropriate learning strategies, but they deal with structural
properties of classes of E-pattern languages, and, in particular, with the problem
of finding telltales for these languages. It is shown that the inferrability of classes
of E-pattern languages is closely connected to some problems on the ambiguity
of morphisms so that the technical contributions of the paper largely consist of
combinatorial insights into morphisms in word monoids
Discontinuities in pattern inference
This paper deals with the inferrability of classes of E-pattern languagesâalso referred
to as extended or erasing pattern languagesâfrom positive data in Goldâs
model of identification in the limit. The first main part of the paper shows that
the recently presented negative result on terminal-free E-pattern languages over binary
alphabets does not hold for other alphabet sizes, so that the full class of these
languages is inferrable from positive data if and only if the corresponding terminal
alphabet does not consist of exactly two distinct letters. The second main part yields
the insight that the positive result on terminal-free E-pattern languages over alphabets
with three or four letters cannot be extended to the class of general E-pattern
languages. With regard to larger alphabets, the extensibility remains open.
The proof methods developed for these main results do not directly discuss the
(non-)existence of appropriate learning strategies, but they deal with structural
properties of classes of E-pattern languages, and, in particular, with the problem
of finding telltales for these languages. It is shown that the inferrability of classes
of E-pattern languages is closely connected to some problems on the ambiguity
of morphisms so that the technical contributions of the paper largely consist of
combinatorial insights into morphisms in word monoids
Unambiguous 1-Uniform Morphisms
A morphism h is unambiguous with respect to a word w if there is no other
morphism g that maps w to the same image as h. In the present paper we study
the question of whether, for any given word, there exists an unambiguous
1-uniform morphism, i.e., a morphism that maps every letter in the word to an
image of length 1.Comment: In Proceedings WORDS 2011, arXiv:1108.341
A Parameterized Study of Maximum Generalized Pattern Matching Problems
The generalized function matching (GFM) problem has been intensively studied
starting with [Ehrenfeucht and Rozenberg, 1979]. Given a pattern p and a text
t, the goal is to find a mapping from the letters of p to non-empty substrings
of t, such that applying the mapping to p results in t. Very recently, the
problem has been investigated within the framework of parameterized complexity
[Fernau, Schmid, and Villanger, 2013].
In this paper we study the parameterized complexity of the optimization
variant of GFM (called Max-GFM), which has been introduced in [Amir and Nor,
2007]. Here, one is allowed to replace some of the pattern letters with some
special symbols "?", termed wildcards or don't cares, which can be mapped to an
arbitrary substring of the text. The goal is to minimize the number of
wildcards used.
We give a complete classification of the parameterized complexity of Max-GFM
and its variants under a wide range of parameterizations, such as, the number
of occurrences of a letter in the text, the size of the text alphabet, the
number of occurrences of a letter in the pattern, the size of the pattern
alphabet, the maximum length of a string matched to any pattern letter, the
number of wildcards and the maximum size of a string that a wildcard can be
mapped to.Comment: to appear in Proc. IPEC'1
Restricted ambiguity of erasing morphisms
A morphism h is called ambiguous for a string s if there
is another morphism that maps s to the same image as h; otherwise,
it is called unambiguous. In this paper, we examine some fundamental
problems on the ambiguity of erasing morphisms. We provide a detailed
analysis of so-called ambiguity partitions, and our main result uses this
concept to characterise those strings that have a morphism of strongly
restricted ambiguity. Furthermore, we demonstrate that there are strings
for which the set of unambiguous morphisms, depending on the size of
the target alphabet of these morphisms, is empty, finite or infinite. Finally,
we show that the problem of the existence of unambiguous erasing
morphisms is equivalent to some basic decision problems for nonerasing
multi-pattern languages
Closure properties of pattern languages
Pattern languages are a well-established class of languages that is particularly popular in algorithmic learning theory, but very little is known about their closure properties. In the present paper we establish a large number of closure properties of the terminal-free pattern languages, and we characterise when the union of two terminal-free pattern languages is again a terminal-free pattern language. We demonstrate that the equivalent question for general pattern languages is characterised differently, and that it is linked to some of the most prominent open problems for pattern languages. We also provide fundamental insights into a well-known construction of E-pattern languages as unions of NE-pattern languages, and vice versa. Š 2014 Springer International Publishing Switzerland
Weakly Unambiguous Morphisms
A nonerasing morphism sigma is said to be weakly unambiguous with respect to a word w if sigma is the only nonerasing morphism that can map w to sigma(w), i.e., there does not exist any other nonerasing morphism tau satisfying tau(w) = sigma(w). In the present paper, we wish to characterise those words with respect to which there exists such a morphism. This question is nontrivial if we consider so-called length-increasing morphisms, which map a word to an image that is strictly longer than the word. Our main result is a compact characterisation that holds for all morphisms with ternary or larger target alphabets. We also comprehensively describe those words that have a weakly unambiguous length-increasing morphism with a unary target alphabet, but we have to leave the problem open for binary alphabets, where we can merely give some non-characteristic conditions
Bad news on decision problems for patterns
We study the inclusion problem for pattern languages, which
is shown to be undecidable by Jiang et al. (J. Comput. System Sci. 50,
1995). More precisely, Jiang et al. demonstrate that there is no effective
procedure deciding the inclusion for the class of all pattern languages
over all alphabets. Most applications of pattern languages, however, consider
classes over fixed alphabets, and therefore it is practically more
relevant to ask for the existence of alphabet-specific decision procedures.
Our first main result states that, for all but very particular cases, this
version of the inclusion problem is also undecidable. The second main
part of our paper disproves the prevalent conjecture on the inclusion
of so-called similar E-pattern languages, and it explains the devastating
consequences of this result for the intensive previous research on the
most prominent open decision problem for pattern languages, namely
the equivalence problem for general E-pattern languages
Regular and context-free pattern languages over small alphabets
Pattern languages are generalisations of the copy language,
which is a standard textbook example of a context-sensitive and noncontext-
free language. In this work, we investigate a counter-intuitive
phenomenon: with respect to alphabets of size 2 and 3, pattern languages
can be regular or context-free in an unexpected way. For this regularity
and context-freeness of pattern languages, we give several sufficient and
necessary conditions and improve known results
Patterns with bounded treewidth
We show that any parameter of patterns that is an upper
bound for the treewidth of appropriate encodings of patterns as relational
structures, if restricted to a constant, allows the membership problem
for pattern languages to be solved in polynomial time. Furthermore, we
identify a new such parameter, called the scope coincidence degree