Search CORE

4,849 research outputs found

Improved genetic algorithm for the context-free grammatical inference

Author: Gietka Adrianna
Publication venue: 'Uniwersytetu Marii Curie-Sklodowskiej w Lublinie'
Publication date: 04/01/2015
Field of study

Inductive learning of formal languages, often called grammatical inference, is an active area inmachine learning and computational learning theory. By learning a language we understandfinding the grammar of the language when some positive (words from language) and negativeexamples (words that are not in language) are given. Learning mechanisms use the naturallanguage learning model: people master a language, used by their environment, by the analysis ofpositive and negative examples. The problem of inferring context-free languages (CFG) has boththeoretical and practical motivations. Practical applications include pattern recognition (forexample finding DTD or XML schemas for XML documents) and speech recognition (the abilityto infer context-free grammars for natural languages would enable speech recognition to modify itsinternal grammar on the fly). There were several attempts to find effective learning methods forcontext-free languages (for example [1,2,3,4,5]). In particular, Y.Sakakibara [3] introduced aninteresting method of finding a context-free grammar in the Chomsky normal form with a minimalset of nonterminals. He used the tabular representation similar to the parse table used in the CYKalgorithm, simultaneously with genetic algorithms. In this paper we present several adjustments tothe algorithm suggested by Sakakibara. The adjustments are concerned mainly with the geneticalgorithms used and are as follows:– we introduce a method of creating the initial population which makes use of characteristicfeatures of context-free grammars,– new genetic operations are used (mutation with a path added, ‘die process’, ‘war/diseaseprocess’),– different definition of the fitness function,– an effective compression of the structure of an individual in the population is suggested.These changes allow to speed up the process of grammar generation and, what is more, theyallow to infer richer grammars than considered in [3]

University of Maria Curie-Skłodowska (UMCS): Scientific e-Journals / Uniwersytet Marii Curie-Skłodowskiej: e-czasopisma naukowe

Maintaining regularity and generalization in data using the minimum description length principle and genetic algorithm: case of grammatical inference

Author: Angluin
Angluin
Angluin
Bagchi
Bhalse
Choubey
Choubey
Choubey
Clark
Clark
Cleeremans
De La Higuera
de la Higuera
Delgado
Dupont
D’Ulizia
Elman
Fu
Gallager
Gold
Graves
Grünwald
Hansen
Harrison
Higuera
Holland
Hrnčič
Hrnčič
Iuspa
Jonyer
Li
Michalewicz
Pandey
Pandey
Pandey
Petasis
Rissanen
Roy
Saers
Sakakibara
Sakakibara
Sivaraj
Solomonoff
Stevenson
Stevenson
Theeramunkongy
Valiant
Yang
Yoshinaka
Črepinšek
Črepinšek
Publication venue: 'Elsevier BV'
Publication date: 17/05/2016
Field of study

In this paper, a genetic algorithm with minimum description length (GAWMDL) is proposed for grammatical inference. The primary challenge of identifying a language of infinite cardinality from a finite set of examples should know when to generalize and specialize the training data. The minimum description length principle that has been incorporated addresses this issue is discussed in this paper. Previously, the e-GRIDS learning model was proposed, which enjoyed the merits of the minimum description length principle, but it is limited to positive examples only. The proposed GAWMDL, which incorporates a traditional genetic algorithm and has a powerful global exploration capability that can exploit an optimum offspring. This is an effective approach to handle a problem which has a large search space such the grammatical inference problem. The computational capability, the genetic algorithm poses is not questionable, but it still suffers from premature convergence mainly arising due to lack of population diversity. The proposed GAWMDL incorporates a bit mask oriented data structure that performs the reproduction operations, creating the mask, then Boolean based procedure is applied to create an offspring in a generative manner. The Boolean based procedure is capable of introducing diversity into the population, hence alleviating premature convergence. The proposed GAWMDL is applied in the context free as well as regular languages of varying complexities. The computational experiments show that the GAWMDL finds an optimal or close-to-optimal grammar. Two fold performance analysis have been performed. First, the GAWMDL has been evaluated against the elite mating pool genetic algorithm which was proposed to introduce diversity and to address premature convergence. GAWMDL is also tested against the improved tabular representation algorithm. In addition, the authors evaluate the performance of the GAWMDL against a genetic algorithm not using the minimum description length principle. Statistical tests demonstrate the superiority of the proposed algorithm. Overall, the proposed GAWMDL algorithm greatly improves the performance in three main aspects: maintains regularity of the data, alleviates premature convergence and is capable in grammatical inference from both positive and negative corpora

Nottingham ePrints

Nottingham eTheses

Crossref

Edge Hill University Research Information Repository

Middlesex University Research Repository

University of Missouri, St. Louis

Maintaining regularity and generalization in data using the minimum description length principle and genetic algorithm: Case of grammatical inference

Author: Chaudhary Ankit
Kendall Graham
Mehrotra Deepti
Pandey Hari Mohan
Publication venue: IRL @ UMSL
Publication date: 01/12/2016
Field of study

University of Missouri, St. Louis

Maintaining regularity and generalization in data using the minimum description length principle and genetic algorithm: case of grammatical inference

Author: Chaudhary A.
Chaudhary A.
Kendall G.
Kendall G.
Mehrotra D.
Mehrotra D.
Pandey H.
Pandey H.
Publication venue: Elsevier
Publication date: 01/01/2016
Field of study

Middlesex University Research Repository

Automatic Machine Learning by Pipeline Synthesis using Model-Based Reinforcement Learning and a Grammar

Author: Cho Kyunghyun
Drori Iddo
Freire Juliana
Krishnamurthy Yamuna
Lourenco Raoni
Rampin Remi
Silva Claudio
Publication venue
Publication date: 01/01/2019
Field of study

Automatic machine learning is an important problem in the forefront of machine learning. The strongest AutoML systems are based on neural networks, evolutionary algorithms, and Bayesian optimization. Recently AlphaD3M reached state-of-the-art results with an order of magnitude speedup using reinforcement learning with self-play. In this work we extend AlphaD3M by using a pipeline grammar and a pre-trained model which generalizes from many different datasets and similar tasks. Our results demonstrate improved performance compared with our earlier work and existing methods on AutoML benchmark datasets for classification and regression tasks. In the spirit of reproducible research we make our data, models, and code publicly available.Comment: ICML Workshop on Automated Machine Learnin

arXiv.org e-Print Archive

Open Repository and Bibliography - Luxembourg