9 research outputs found

    A discontinuity in pattern inference

    Get PDF
    This paper examines the learnability of a major subclass of E-pattern languages – also known as erasing or extended pattern languages – in Gold’s learning model: We show that the class of terminal-free E-pattern languages is inferrable from positive data if the corresponding terminal alphabet consists of three or more letters. Consequently, the recently presented negative result for binary alphabets is unique

    A non-learnable class of E-pattern languages

    Get PDF
    We investigate the inferrability of E-pattern languages (also known as extended or erasing pattern languages) from positive data in Gold’s learning model. As the main result, our analysis yields a negative outcome for the full class of E-pattern languages – and even for the subclass of terminal-free E-pattern languages – if the corresponding terminal alphabet consists of exactly two distinct letters. Furthermore, we present a positive result for a manifest subclass of terminal-free E-pattern languages. We point out that the considered problems are closely related to fundamental questions concerning the nondeterminism of E-pattern languages

    On the learnability of E-pattern languages over small alphabets

    Get PDF
    This paper deals with two well discussed, but largely open problems on E-pattern languages, also known as extended or erasing pattern languages: primarily, the learnability in Gold’s learning model and, secondarily, the decidability of the equivalence. As the main result, we show that the full class of E-pattern languages is not inferrable from positive data if the corresponding terminal alphabet consists of exactly three or of exactly four letters – an insight that remarkably contrasts with the recent positive finding on the learnability of the subclass of terminal-free E-pattern languages for these alphabets. As a side-effect of our reasoning thereon, we reveal some particular example patterns that disprove a conjecture of Ohlebusch and Ukkonen (Theoretical Computer Science 186, 1997) on the decidability of the equivalence of E-pattern languages

    Developments from enquiries into the learnability of the pattern languages from positive data

    Get PDF
    AbstractThe pattern languages are languages that are generated from patterns, and were first proposed by Angluin as a non-trivial class that is inferable from positive data [D. Angluin, Finding patterns common to a set of strings, Journal of Computer and System Sciences 21 (1980) 46–62; D. Angluin, Inductive inference of formal languages from positive data, Information and Control 45 (1980) 117–135]. In this paper we chronologize some results that developed from the investigations on the inferability of the pattern languages from positive data

    The unambiguity of segmented morphisms

    Get PDF
    This paper studies the ambiguity of morphisms in free monoids. A morphism σ is said to be ambiguous with respect to a string α if there exists a morphism τ which differs from σ for a symbol occurring in α, but nevertheless satisfies τ(α) = σ(α); if there is no such τ then σ is called unambiguous. Motivated by the recent initial paper on the ambiguity of morphisms, we introduce the definition of a so-called segmented morphism σn, which, for any n ∈ N, maps every symbol in an infinite alphabet onto a word that consists of n distinct factors in ab+a, where a and b are different letters. For every n, we consider the set U(σn) of those finite strings over an infinite alphabet with respect to which σn is unambiguous, and we comprehensively describe its relation to any U(σm), m ≠ n. Thus, our work features the first approach to a characterisation of sets of strings with respect to which certain fixed morphisms are unambiguous, and it leads to fairly counter-intuitive insights into the relations between such sets. Furthermore, it shows that, among the widely used homogeneous morphisms, most segmented morphisms are optimal in terms of being unambiguous for a preferably large set of strings. Finally, our paper yields several major improvements of crucial techniques previously used for research on the ambiguity of morphisms

    Discontinuities in pattern inference

    Get PDF
    This paper deals with the inferrability of classes of E-pattern languages—also referred to as extended or erasing pattern languages—from positive data in Gold’s model of identification in the limit. The first main part of the paper shows that the recently presented negative result on terminal-free E-pattern languages over binary alphabets does not hold for other alphabet sizes, so that the full class of these languages is inferrable from positive data if and only if the corresponding terminal alphabet does not consist of exactly two distinct letters. The second main part yields the insight that the positive result on terminal-free E-pattern languages over alphabets with three or four letters cannot be extended to the class of general E-pattern languages. With regard to larger alphabets, the extensibility remains open. The proof methods developed for these main results do not directly discuss the (non-)existence of appropriate learning strategies, but they deal with structural properties of classes of E-pattern languages, and, in particular, with the problem of finding telltales for these languages. It is shown that the inferrability of classes of E-pattern languages is closely connected to some problems on the ambiguity of morphisms so that the technical contributions of the paper largely consist of combinatorial insights into morphisms in word monoids

    Discontinuities in pattern inference

    Get PDF
    This paper deals with the inferrability of classes of E-pattern languages—also referred to as extended or erasing pattern languages—from positive data in Gold’s model of identification in the limit. The first main part of the paper shows that the recently presented negative result on terminal-free E-pattern languages over binary alphabets does not hold for other alphabet sizes, so that the full class of these languages is inferrable from positive data if and only if the corresponding terminal alphabet does not consist of exactly two distinct letters. The second main part yields the insight that the positive result on terminal-free E-pattern languages over alphabets with three or four letters cannot be extended to the class of general E-pattern languages. With regard to larger alphabets, the extensibility remains open. The proof methods developed for these main results do not directly discuss the (non-)existence of appropriate learning strategies, but they deal with structural properties of classes of E-pattern languages, and, in particular, with the problem of finding telltales for these languages. It is shown that the inferrability of classes of E-pattern languages is closely connected to some problems on the ambiguity of morphisms so that the technical contributions of the paper largely consist of combinatorial insights into morphisms in word monoids

    Inklusion von Patternsprachen und verwandte Probleme

    Get PDF
    A pattern is a word that consists of variables and terminal symbols. The pattern language that is generated by a pattern A is the set of all terminal words that can be obtained from A by uniform replacement of variables with terminal words. For example, the pattern A = a x y a x (where x and y are variables, and the letter a is a terminal symbol) generates the set of all words that have some word a x both as prefix and suffix (where these two occurrences of a x do not overlap). Due to their simple definition, pattern languages have various connections to a wide range of other areas in theoretical computer science and mathematics. Among these areas are combinatorics on words, logic, and the theory of free semigroups. On the other hand, many of the canonical questions in formal language theory are surprisingly difficult. The present thesis discusses various aspects of the inclusion problem of pattern languages. It can be divide in two parts. The first one examines the decidability of pattern languages with a limited number of variables and fixed terminal alphabets. In addition to this, the minimizability of regular expressions with repetition operators is studied. The second part deals with descriptive patterns, the smallest generalizations of arbitrary languages through pattern languages ("smallest" with respect to the inclusion relation). Main questions are the existence and the discoverability of descriptive patterns for arbitrary languages.Ein Pattern ist ein Wort aus Variablen und Terminalsymbolen. Die von einem Pattern A erzeugte Patternsprache ist die Menge aller Terminalwörter, die durch eine uniforme Ersetzung der Variablen in A durch Terminalwörter erzeugt werden können. So beschreibt das Pattern A = a x y a x (wobei x und y Variablen sind und a ein Terminal ist) die Menge aller Wörter, die ein Wort der Form a x sowohl als Präfix, als auch als Suffix haben (ohne dass sich diese beiden Vorkommen von a x überlappen). Wegen ihrer einfachen Definition besitzen Patternsprachen eine Vielzahl von Verbindungen zu verschiedenen anderen Gebieten der theoretischen Informatik und Mathematik, unter anderem zur Wortkombinatorik, Logik und der Theorie freier Halbgruppen. Andererseits führen viele der üblichen sprachtheoretischen Fragestellungen bei Patternsprachen zu kombinatorischen Problemen von überraschender Schwierigkeit. Die vorliegende Dissertation widmet sich verschiedenen Aspekten des Inklusionsproblems von Patternsprachen und kann in zwei Teile unterteilt werden. Der erste Teil untersucht die Entscheidbarkeit des Inklusionsproblems für Sprachen, die von Pattern mit beschränkter Variablenzahl über Terminalalphabeten von beschränkter Größe erzeugt werden. Darüber hinaus werden verschiedene Aspekte der Minimierbarkeit von regulären Ausdrücken mit Rückreferenzen betrachtet. Der zweite Teil der Dissertation handelt von deskriptiven Pattern; d.h. denjenigen Pattern, die die (hinsichtlich der Inklusion) kleinsten Verallgemeinerungen einer gegebenen Sprache erzeugen. Hauptfragen sind hierbei die Existenz und die Auffindbarkeit deskriptiver Pattern für beliebige Sprachen