16 research outputs found
A Parameterized Study of Maximum Generalized Pattern Matching Problems
The generalized function matching (GFM) problem has been intensively studied
starting with [Ehrenfeucht and Rozenberg, 1979]. Given a pattern p and a text
t, the goal is to find a mapping from the letters of p to non-empty substrings
of t, such that applying the mapping to p results in t. Very recently, the
problem has been investigated within the framework of parameterized complexity
[Fernau, Schmid, and Villanger, 2013].
In this paper we study the parameterized complexity of the optimization
variant of GFM (called Max-GFM), which has been introduced in [Amir and Nor,
2007]. Here, one is allowed to replace some of the pattern letters with some
special symbols "?", termed wildcards or don't cares, which can be mapped to an
arbitrary substring of the text. The goal is to minimize the number of
wildcards used.
We give a complete classification of the parameterized complexity of Max-GFM
and its variants under a wide range of parameterizations, such as, the number
of occurrences of a letter in the text, the size of the text alphabet, the
number of occurrences of a letter in the pattern, the size of the pattern
alphabet, the maximum length of a string matched to any pattern letter, the
number of wildcards and the maximum size of a string that a wildcard can be
mapped to.Comment: to appear in Proc. IPEC'1
Bad news on decision problems for patterns
We study the inclusion problem for pattern languages, which
is shown to be undecidable by Jiang et al. (J. Comput. System Sci. 50,
1995). More precisely, Jiang et al. demonstrate that there is no effective
procedure deciding the inclusion for the class of all pattern languages
over all alphabets. Most applications of pattern languages, however, consider
classes over fixed alphabets, and therefore it is practically more
relevant to ask for the existence of alphabet-specific decision procedures.
Our first main result states that, for all but very particular cases, this
version of the inclusion problem is also undecidable. The second main
part of our paper disproves the prevalent conjecture on the inclusion
of so-called similar E-pattern languages, and it explains the devastating
consequences of this result for the intensive previous research on the
most prominent open decision problem for pattern languages, namely
the equivalence problem for general E-pattern languages
Regular and context-free pattern languages over small alphabets
Pattern languages are generalisations of the copy language,
which is a standard textbook example of a context-sensitive and noncontext-
free language. In this work, we investigate a counter-intuitive
phenomenon: with respect to alphabets of size 2 and 3, pattern languages
can be regular or context-free in an unexpected way. For this regularity
and context-freeness of pattern languages, we give several sufficient and
necessary conditions and improve known results
Unambiguous morphic images of strings
Motivated by the research on pattern languages, we study a fundamental combinatorial question on morphisms in free semigroups: With regard to any string α over some alphabet we ask for the existence
of a morphism σ such that σ(α) is unambiguous, i.e. there is no morphism ρ with ρ ≠ σ and ρ(α) = σ(α). Our main result shows that a rich and natural class of strings is provided with unambiguous morphic images
On the equivalence problem for E-pattern languages over small alphabets
We contribute new facets to the discussion on the equivalence
problem for E-pattern languages (also referred to as extended or
erasing pattern languages). This fundamental open question asks for the
existence of a computable function that, given any pair of patterns, decides
whether or not they generate the same language. Our main result
disproves Ohlebusch and Ukkonen’s conjecture (Theoretical Computer
Science 186, 1997) on the equivalence problem; the respective argumentation,
that largely deals with the nondeterminism of pattern languages,
is restricted to terminal alphabets with at most four distinct letters
On the learnability of E-pattern languages over small alphabets
This paper deals with two well discussed, but largely open
problems on E-pattern languages, also known as extended or erasing
pattern languages: primarily, the learnability in Gold’s learning model
and, secondarily, the decidability of the equivalence. As the main result,
we show that the full class of E-pattern languages is not inferrable from
positive data if the corresponding terminal alphabet consists of exactly
three or of exactly four letters – an insight that remarkably contrasts
with the recent positive finding on the learnability of the subclass of
terminal-free E-pattern languages for these alphabets. As a side-effect of
our reasoning thereon, we reveal some particular example patterns that
disprove a conjecture of Ohlebusch and Ukkonen (Theoretical Computer
Science 186, 1997) on the decidability of the equivalence of E-pattern
languages
Regular and Context-Free Pattern Languages over Small Alphabets
Pattern languages are generalisations of the copy language, which is a standard
textbook example of a context-sensitive and non-context-free language. In this
work, we investigate a counter-intuitive phenomenon: with respect to alphabets
of size 2 and 3, pattern languages can be regular or context-free in an unexpected
way. For this regularity and context-freeness of pattern languages, we give
several sufficient and necessary conditions and improve known results
Combinatorics and Algorithmics of Strings
Edited in cooperation with Robert MercaşStrings (aka sequences or words) form the most basic and natural data structure. They occur whenever information is electronically transmitted (as bit streams), when natural language text is spoken or written down (as words over, for example, the Latin alphabet), in the process of heredity transmission in living cells (through DNA sequences) or the protein synthesis (as sequence of amino acids), and in many more different contexts. Given this universal form of representing information, the need to process strings is apparent and is actually a core purpose of computer use. Algorithms to efficiently search through, analyze, (de-)compress, match, encode and decode strings are therefore of chief interest. Combinatorial problems about strings lie at the core of such algorithmic questions. Many such combinatorial problems are common in the string processing efforts in the different fields of application.http://drops.dagstuhl.de/opus/volltexte/2014/4552
Unambiguous morphic images of strings
We study a fundamental combinatorial problem on morphisms in free semigroups: With
regard to any string α over some alphabet we ask for the existence of a morphism σ such
that σ(α) is unambiguous, i.e. there is no morphism T with T(i) ≠ σ(i) for some symbol
i in α and, nevertheless, T(α) = σ(α). As a consequence of its elementary nature, this
question shows a variety of connections to those topics in discrete mathematics which
are based on finite strings and morphisms such as pattern languages, equality sets and,
thus, the Post Correspondence Problem.
Our studies demonstrate that the existence of unambiguous morphic images essen-
tially depends on the structure of α: We introduce a partition of the set of all finite
strings into those that are decomposable (referred to as prolix) in a particular manner
and those that are indecomposable (called succinct). This partition, that is also known
to be of major importance for the research on pattern languages and on finite fixed
points of morphisms, allows to formulate our main result according to which a string α
can be mapped by an injective morphism onto an unambiguous image if and only if α is
succinct