1,337 research outputs found

    Developments from enquiries into the learnability of the pattern languages from positive data

    Get PDF
    AbstractThe pattern languages are languages that are generated from patterns, and were first proposed by Angluin as a non-trivial class that is inferable from positive data [D. Angluin, Finding patterns common to a set of strings, Journal of Computer and System Sciences 21 (1980) 46–62; D. Angluin, Inductive inference of formal languages from positive data, Information and Control 45 (1980) 117–135]. In this paper we chronologize some results that developed from the investigations on the inferability of the pattern languages from positive data

    Inferring descriptive generalisations of formal languages

    Get PDF
    In the present paper, we introduce a variant of Gold-style learners that is not required to infer precise descriptions of the languages in a class, but that must find descriptive patterns, i.e., optimal generalisations within a class of pattern languages. Our first main result characterises those indexed families of recursive languages that can be inferred by such learners, and we demonstrate that this characterisation shows enlightening connections to Angluin’s corresponding result for exact inference. Using a notion of descriptiveness that is restricted to the natural subclass of terminal-free E-pattern languages, we introduce a generic inference strategy, and our second main result characterises those classes of languages that can be generalised by this strategy. This characterisation demonstrates that there are major classes of languages that can be generalised in our model, but not be inferred by a normal Gold-style learner. Our corresponding technical considerations lead to deep insights of intrinsic interest into combinatorial and algorithmic properties of pattern languages

    Inferring descriptive generalisations of formal languages

    Get PDF
    In the present paper, we introduce a variant of Gold-style learners that is not required to infer precise descriptions of the languages in a class, but that must nd descriptive patterns, i. e., optimal generalisations within a class of pattern languages. Our rst main result characterises those indexed families of recursive languages that can be inferred by such learners, and we demonstrate that this characterisation shows enlightening connections to Angluin's corresponding result for exact inference. Furthermore, this result reveals that our model can be interpreted as an instance of a natural extension of Gold's model of language identi cation in the limit. Using a notion of descriptiveness that is restricted to the natural subclass of terminal-free E-pattern languages, we introduce a generic inference strategy, and our second main result characterises those classes of languages that can be generalised by this strategy. This characterisation demonstrates that there are major classes of languages that can be generalised in our model, but not be inferred by a normal Gold-style learner. Our corresponding technical considerations lead to insights of intrinsic interest into combinatorial and algorithmic properties of pattern languages

    : Méthodes d'Inférence Symbolique pour les Bases de Données

    Get PDF
    This dissertation is a summary of a line of research, that I wasactively involved in, on learning in databases from examples. Thisresearch focused on traditional as well as novel database models andlanguages for querying, transforming, and describing the schema of adatabase. In case of schemas our contributions involve proposing anoriginal languages for the emerging data models of Unordered XML andRDF. We have studied learning from examples of schemas for UnorderedXML, schemas for RDF, twig queries for XML, join queries forrelational databases, and XML transformations defined with a novelmodel of tree-to-word transducers.Investigating learnability of the proposed languages required us toexamine closely a number of their fundamental properties, often ofindependent interest, including normal forms, minimization,containment and equivalence, consistency of a set of examples, andfinite characterizability. Good understanding of these propertiesallowed us to devise learning algorithms that explore a possibly largesearch space with the help of a diligently designed set ofgeneralization operations in search of an appropriate solution.Learning (or inference) is a problem that has two parameters: theprecise class of languages we wish to infer and the type of input thatthe user can provide. We focused on the setting where the user inputconsists of positive examples i.e., elements that belong to the goallanguage, and negative examples i.e., elements that do not belong tothe goal language. In general using both negative and positiveexamples allows to learn richer classes of goal languages than usingpositive examples alone. However, using negative examples is oftendifficult because together with positive examples they may cause thesearch space to take a very complex shape and its exploration may turnout to be computationally challenging.Ce mĂ©moire est une courte prĂ©sentation d’une direction de recherche, Ă  laquelle j’ai activementparticipĂ©, sur l’apprentissage pour les bases de donnĂ©es Ă  partir d’exemples. Cette recherches’est concentrĂ©e sur les modĂšles et les langages, aussi bien traditionnels qu’émergents, pourl’interrogation, la transformation et la description du schĂ©ma d’une base de donnĂ©es. Concernantles schĂ©mas, nos contributions consistent en plusieurs langages de schĂ©mas pour les nouveaumodĂšles de bases de donnĂ©es que sont XML non-ordonnĂ© et RDF. Nous avons ainsi Ă©tudiĂ©l’apprentissage Ă  partir d’exemples des schĂ©mas pour XML non-ordonnĂ©, des schĂ©mas pour RDF,des requĂȘtes twig pour XML, les requĂȘtes de jointure pour bases de donnĂ©es relationnelles et lestransformations XML dĂ©finies par un nouveau modĂšle de transducteurs arbre-Ă -mot.Pour explorer si les langages proposĂ©s peuvent ĂȘtre appris, nous avons Ă©tĂ© obligĂ©s d’examinerde prĂšs un certain nombre de leurs propriĂ©tĂ©s fondamentales, souvent souvent intĂ©ressantespar elles-mĂȘmes, y compris les formes normales, la minimisation, l’inclusion et l’équivalence, lacohĂ©rence d’un ensemble d’exemples et la caractĂ©risation finie. Une bonne comprĂ©hension de cespropriĂ©tĂ©s nous a permis de concevoir des algorithmes d’apprentissage qui explorent un espace derecherche potentiellement trĂšs vaste grĂące Ă  un ensemble d’opĂ©rations de gĂ©nĂ©ralisation adaptĂ© Ă la recherche d’une solution appropriĂ©e.L’apprentissage (ou l’infĂ©rence) est un problĂšme Ă  deux paramĂštres : la classe prĂ©cise delangage que nous souhaitons infĂ©rer et le type d’informations que l’utilisateur peut fournir. Nousnous sommes placĂ©s dans le cas oĂč l’utilisateur fournit des exemples positifs, c’est-Ă -dire desĂ©lĂ©ments qui appartiennent au langage cible, ainsi que des exemples nĂ©gatifs, c’est-Ă -dire qui n’enfont pas partie. En gĂ©nĂ©ral l’utilisation Ă  la fois d’exemples positifs et nĂ©gatifs permet d’apprendredes classes de langages plus riches que l’utilisation uniquement d’exemples positifs. Toutefois,l’utilisation des exemples nĂ©gatifs est souvent difficile parce que les exemples positifs et nĂ©gatifspeuvent rendre la forme de l’espace de recherche trĂšs complexe, et par consĂ©quent, son explorationinfaisable

    The use of data-mining for the automatic formation of tactics

    Get PDF
    This paper discusses the usse of data-mining for the automatic formation of tactics. It was presented at the Workshop on Computer-Supported Mathematical Theory Development held at IJCAR in 2004. The aim of this project is to evaluate the applicability of data-mining techniques to the automatic formation of tactics from large corpuses of proofs. We data-mine information from large proof corpuses to find commonly occurring patterns. These patterns are then evolved into tactics using genetic programming techniques

    Inklusion von Patternsprachen und verwandte Probleme

    Get PDF
    A pattern is a word that consists of variables and terminal symbols. The pattern language that is generated by a pattern A is the set of all terminal words that can be obtained from A by uniform replacement of variables with terminal words. For example, the pattern A = a x y a x (where x and y are variables, and the letter a is a terminal symbol) generates the set of all words that have some word a x both as prefix and suffix (where these two occurrences of a x do not overlap). Due to their simple definition, pattern languages have various connections to a wide range of other areas in theoretical computer science and mathematics. Among these areas are combinatorics on words, logic, and the theory of free semigroups. On the other hand, many of the canonical questions in formal language theory are surprisingly difficult. The present thesis discusses various aspects of the inclusion problem of pattern languages. It can be divide in two parts. The first one examines the decidability of pattern languages with a limited number of variables and fixed terminal alphabets. In addition to this, the minimizability of regular expressions with repetition operators is studied. The second part deals with descriptive patterns, the smallest generalizations of arbitrary languages through pattern languages ("smallest" with respect to the inclusion relation). Main questions are the existence and the discoverability of descriptive patterns for arbitrary languages.Ein Pattern ist ein Wort aus Variablen und Terminalsymbolen. Die von einem Pattern A erzeugte Patternsprache ist die Menge aller Terminalwörter, die durch eine uniforme Ersetzung der Variablen in A durch Terminalwörter erzeugt werden können. So beschreibt das Pattern A = a x y a x (wobei x und y Variablen sind und a ein Terminal ist) die Menge aller Wörter, die ein Wort der Form a x sowohl als PrĂ€fix, als auch als Suffix haben (ohne dass sich diese beiden Vorkommen von a x ĂŒberlappen). Wegen ihrer einfachen Definition besitzen Patternsprachen eine Vielzahl von Verbindungen zu verschiedenen anderen Gebieten der theoretischen Informatik und Mathematik, unter anderem zur Wortkombinatorik, Logik und der Theorie freier Halbgruppen. Andererseits fĂŒhren viele der ĂŒblichen sprachtheoretischen Fragestellungen bei Patternsprachen zu kombinatorischen Problemen von ĂŒberraschender Schwierigkeit. Die vorliegende Dissertation widmet sich verschiedenen Aspekten des Inklusionsproblems von Patternsprachen und kann in zwei Teile unterteilt werden. Der erste Teil untersucht die Entscheidbarkeit des Inklusionsproblems fĂŒr Sprachen, die von Pattern mit beschrĂ€nkter Variablenzahl ĂŒber Terminalalphabeten von beschrĂ€nkter GrĂ¶ĂŸe erzeugt werden. DarĂŒber hinaus werden verschiedene Aspekte der Minimierbarkeit von regulĂ€ren AusdrĂŒcken mit RĂŒckreferenzen betrachtet. Der zweite Teil der Dissertation handelt von deskriptiven Pattern; d.h. denjenigen Pattern, die die (hinsichtlich der Inklusion) kleinsten Verallgemeinerungen einer gegebenen Sprache erzeugen. Hauptfragen sind hierbei die Existenz und die Auffindbarkeit deskriptiver Pattern fĂŒr beliebige Sprachen
    • 

    corecore