Search CORE

532 research outputs found

Improved genetic algorithm for the context-free grammatical inference

Author: Gietka Adrianna
Publication venue: 'Uniwersytetu Marii Curie-Sklodowskiej w Lublinie'
Publication date: 04/01/2015
Field of study

Inductive learning of formal languages, often called grammatical inference, is an active area inmachine learning and computational learning theory. By learning a language we understandfinding the grammar of the language when some positive (words from language) and negativeexamples (words that are not in language) are given. Learning mechanisms use the naturallanguage learning model: people master a language, used by their environment, by the analysis ofpositive and negative examples. The problem of inferring context-free languages (CFG) has boththeoretical and practical motivations. Practical applications include pattern recognition (forexample finding DTD or XML schemas for XML documents) and speech recognition (the abilityto infer context-free grammars for natural languages would enable speech recognition to modify itsinternal grammar on the fly). There were several attempts to find effective learning methods forcontext-free languages (for example [1,2,3,4,5]). In particular, Y.Sakakibara [3] introduced aninteresting method of finding a context-free grammar in the Chomsky normal form with a minimalset of nonterminals. He used the tabular representation similar to the parse table used in the CYKalgorithm, simultaneously with genetic algorithms. In this paper we present several adjustments tothe algorithm suggested by Sakakibara. The adjustments are concerned mainly with the geneticalgorithms used and are as follows:– we introduce a method of creating the initial population which makes use of characteristicfeatures of context-free grammars,– new genetic operations are used (mutation with a path added, ‘die process’, ‘war/diseaseprocess’),– different definition of the fitness function,– an effective compression of the structure of an individual in the population is suggested.These changes allow to speed up the process of grammar generation and, what is more, theyallow to infer richer grammars than considered in [3]

University of Maria Curie-Skłodowska (UMCS): Scientific e-Journals / Uniwersytet Marii Curie-Skłodowskiej: e-czasopisma naukowe

Maintaining regularity and generalization in data using the minimum description length principle and genetic algorithm: case of grammatical inference

Author: Angluin
Angluin
Angluin
Bagchi
Bhalse
Choubey
Choubey
Choubey
Clark
Clark
Cleeremans
De La Higuera
de la Higuera
Delgado
Dupont
D’Ulizia
Elman
Fu
Gallager
Gold
Graves
Grünwald
Hansen
Harrison
Higuera
Holland
Hrnčič
Hrnčič
Iuspa
Jonyer
Li
Michalewicz
Pandey
Pandey
Pandey
Petasis
Rissanen
Roy
Saers
Sakakibara
Sakakibara
Sivaraj
Solomonoff
Stevenson
Stevenson
Theeramunkongy
Valiant
Yang
Yoshinaka
Črepinšek
Črepinšek
Publication venue: 'Elsevier BV'
Publication date: 17/05/2016
Field of study

In this paper, a genetic algorithm with minimum description length (GAWMDL) is proposed for grammatical inference. The primary challenge of identifying a language of infinite cardinality from a finite set of examples should know when to generalize and specialize the training data. The minimum description length principle that has been incorporated addresses this issue is discussed in this paper. Previously, the e-GRIDS learning model was proposed, which enjoyed the merits of the minimum description length principle, but it is limited to positive examples only. The proposed GAWMDL, which incorporates a traditional genetic algorithm and has a powerful global exploration capability that can exploit an optimum offspring. This is an effective approach to handle a problem which has a large search space such the grammatical inference problem. The computational capability, the genetic algorithm poses is not questionable, but it still suffers from premature convergence mainly arising due to lack of population diversity. The proposed GAWMDL incorporates a bit mask oriented data structure that performs the reproduction operations, creating the mask, then Boolean based procedure is applied to create an offspring in a generative manner. The Boolean based procedure is capable of introducing diversity into the population, hence alleviating premature convergence. The proposed GAWMDL is applied in the context free as well as regular languages of varying complexities. The computational experiments show that the GAWMDL finds an optimal or close-to-optimal grammar. Two fold performance analysis have been performed. First, the GAWMDL has been evaluated against the elite mating pool genetic algorithm which was proposed to introduce diversity and to address premature convergence. GAWMDL is also tested against the improved tabular representation algorithm. In addition, the authors evaluate the performance of the GAWMDL against a genetic algorithm not using the minimum description length principle. Statistical tests demonstrate the superiority of the proposed algorithm. Overall, the proposed GAWMDL algorithm greatly improves the performance in three main aspects: maintains regularity of the data, alleviates premature convergence and is capable in grammatical inference from both positive and negative corpora

Nottingham ePrints

Nottingham eTheses

Crossref

Edge Hill University Research Information Repository

Middlesex University Research Repository

University of Missouri, St. Louis

Maintaining regularity and generalization in data using the minimum description length principle and genetic algorithm: Case of grammatical inference

Author: Chaudhary Ankit
Kendall Graham
Mehrotra Deepti
Pandey Hari Mohan
Publication venue: IRL @ UMSL
Publication date: 01/12/2016
Field of study

University of Missouri, St. Louis

Maintaining regularity and generalization in data using the minimum description length principle and genetic algorithm: case of grammatical inference

Author: Chaudhary A.
Chaudhary A.
Kendall G.
Kendall G.
Mehrotra D.
Mehrotra D.
Pandey H.
Pandey H.
Publication venue: Elsevier
Publication date: 01/01/2016
Field of study

Middlesex University Research Repository

Grammatical inference of directed acyclic graph languages with polynomial time complexity

Author: Calera-Rubio Jorge
Gallego Antonio-Javier
López Rodríguez Damián
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

[EN] In this paper we study the learning of graph languages. We extend the well-known classes of k-testability and k-testability in the strict sense languages to directed graph languages. We propose a grammatical inference algorithm to learn the class of directed acyclic k- testable in the strict sense graph languages. The algorithm runs in polynomial time and identifies this class of languages from positive data. We study its efficiency under several criteria, and perform a comprehensive experimentation with four datasets to show the validity of the method. Many fields, from pattern recognition to data compression, can take advantage of these results.Gallego, A.; López Rodríguez, D.; Calera-Rubio, J. (2018). Grammatical inference of directed acyclic graph languages with polynomial time complexity. Journal of Computer and System Sciences. 95:19-34. https://doi.org/10.1016/j.jcss.2017.12.002S19349

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

RiuNet

CLiFF Notes: Research In Natural Language Processing at the University of Pennsylvania

Author: Graduate Students Faculty &
Publication venue: ScholarlyCommons
Publication date: 01/03/1992
Field of study

The Computational Linguistics Feedback Forum (CLIFF) is a group of students and faculty who gather once a week to discuss the members\u27 current research. As the word feedback suggests, the group\u27s purpose is the sharing of ideas. The group also promotes interdisciplinary contacts between researchers who share an interest in Cognitive Science. There is no single theme describing the research in Natural Language Processing at Penn. There is work done in CCG, Tree adjoining grammars, intonation, statistical methods, plan inference, instruction understanding, incremental interpretation, language acquisition, syntactic parsing, causal reasoning, free word order languages, ... and many other areas. With this in mind, rather than trying to summarize the varied work currently underway here at Penn, we suggest reading the following abstracts to see how the students and faculty themselves describe their work. Their abstracts illustrate the diversity of interests among the researchers, explain the areas of common interest, and describe some very interesting work in Cognitive Science. This report is a collection of abstracts from both faculty and graduate students in Computer Science, Psychology and Linguistics. We pride ourselves on the close working relations between these groups, as we believe that the communication among the different departments and the ongoing inter-departmental research not only improves the quality of our work, but makes much of that work possible

ScholarlyCommons@Penn

Maintaining regularity and generalizationin data using the minimum description length principle and genetic algorithm: case of grammatical inference

Author: Chaudhary Ankit
Kendall Graham
Mehrotra Deepti
Pandey Hari Mohan
Publication venue: 'Elsevier BV'
Publication date
Field of study

Repository@Nottingham

How do Individuals Interpret Multiple Conceptual Models? A Theory of Combined Ontological Completeness and Overlap

Author: Green Peter
Recker Jan
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2019
Field of study

When analyzing or designing information systems, users often work with multiple conceptual models because each model articulates a different, partial aspect of a real-world domain. However, the available research in this area has largely studied the use of single modeling artifacts only. We develop a new theory about interpreting multiple conceptual models that details propositions for evaluating how individuals select, understand, and perceive the usefulness of multiple conceptual models. We detail implications of our theory development for empirical research on conceptual modeling. We also outline practical contributions for the design of conceptual models and for choosing models for systems analysis and design tasks. Finally, to stimulate research that builds on our theory, we illustrate procedures for enacting our theory and discuss a range of empirically relevant boundary condition

Kölner UniversitätsPublikationsServer

Queensland University of Technology ePrints Archive

AIS Electronic Library (AISeL)