Search CORE

498 research outputs found

Frequency vs. Association for Constraint Selection in Usage-Based Construction Grammar

Author: Dunn Jonathan
Publication venue
Publication date: 01/01/2019
Field of study

A usage-based Construction Grammar (CxG) posits that slot-constraints generalize from common exemplar constructions. But what is the best model of constraint generalization? This paper evaluates competing frequency-based and association-based models across eight languages using a metric derived from the Minimum Description Length paradigm. The experiments show that association-based models produce better generalizations across all languages by a significant margin

arXiv.org e-Print Archive

Crossref

UC Research Repository

Maintaining regularity and generalization in data using the minimum description length principle and genetic algorithm: case of grammatical inference

Author: Angluin
Angluin
Angluin
Bagchi
Bhalse
Choubey
Choubey
Choubey
Clark
Clark
Cleeremans
De La Higuera
de la Higuera
Delgado
Dupont
D’Ulizia
Elman
Fu
Gallager
Gold
Graves
Grünwald
Hansen
Harrison
Higuera
Holland
Hrnčič
Hrnčič
Iuspa
Jonyer
Li
Michalewicz
Pandey
Pandey
Pandey
Petasis
Rissanen
Roy
Saers
Sakakibara
Sakakibara
Sivaraj
Solomonoff
Stevenson
Stevenson
Theeramunkongy
Valiant
Yang
Yoshinaka
Črepinšek
Črepinšek
Publication venue: 'Elsevier BV'
Publication date: 17/05/2016
Field of study

In this paper, a genetic algorithm with minimum description length (GAWMDL) is proposed for grammatical inference. The primary challenge of identifying a language of infinite cardinality from a finite set of examples should know when to generalize and specialize the training data. The minimum description length principle that has been incorporated addresses this issue is discussed in this paper. Previously, the e-GRIDS learning model was proposed, which enjoyed the merits of the minimum description length principle, but it is limited to positive examples only. The proposed GAWMDL, which incorporates a traditional genetic algorithm and has a powerful global exploration capability that can exploit an optimum offspring. This is an effective approach to handle a problem which has a large search space such the grammatical inference problem. The computational capability, the genetic algorithm poses is not questionable, but it still suffers from premature convergence mainly arising due to lack of population diversity. The proposed GAWMDL incorporates a bit mask oriented data structure that performs the reproduction operations, creating the mask, then Boolean based procedure is applied to create an offspring in a generative manner. The Boolean based procedure is capable of introducing diversity into the population, hence alleviating premature convergence. The proposed GAWMDL is applied in the context free as well as regular languages of varying complexities. The computational experiments show that the GAWMDL finds an optimal or close-to-optimal grammar. Two fold performance analysis have been performed. First, the GAWMDL has been evaluated against the elite mating pool genetic algorithm which was proposed to introduce diversity and to address premature convergence. GAWMDL is also tested against the improved tabular representation algorithm. In addition, the authors evaluate the performance of the GAWMDL against a genetic algorithm not using the minimum description length principle. Statistical tests demonstrate the superiority of the proposed algorithm. Overall, the proposed GAWMDL algorithm greatly improves the performance in three main aspects: maintains regularity of the data, alleviates premature convergence and is capable in grammatical inference from both positive and negative corpora

Nottingham ePrints

Nottingham eTheses

Crossref

Edge Hill University Research Information Repository

Middlesex University Research Repository

University of Missouri, St. Louis

Algorithms and implementation of functional dependency discovery in XML : a thesis presented in partial fulfilment of the requirements for the degree of Master of Information Sciences in Information Systems at Massey University

Author: Zhou Zheng
Publication venue: 'Massey University'
Publication date: 01/01/2006
Field of study

1.1 Background Following the advent of the web, there has been a great demand for data interchange between applications using internet infrastructure. XML (extensible Markup Language) provides a structured representation of data empowered by broad adoption and easy deployment. As a subset of SGML (Standard Generalized Markup Language), XML has been standardized by the World Wide Web Consortium (W3C) [Bray et al., 2004], XML is becoming the prevalent data exchange format on the World Wide Web and increasingly significant in storing semi-structured data. After its initial release in 1996, it has evolved and been applied extensively in all fields where the exchange of structured documents in electronic form is required. As with the growing popularity of XML, the issue of functional dependency in XML has recently received well deserved attention. The driving force for the study of dependencies in XML is it is as crucial to XML schema design, as to relational database(RDB) design [Abiteboul et al., 1995]

Massey Research Online

Maintaining regularity and generalization in data using the minimum description length principle and genetic algorithm: case of grammatical inference

Author: Chaudhary A.
Chaudhary A.
Kendall G.
Kendall G.
Mehrotra D.
Mehrotra D.
Pandey H.
Pandey H.
Publication venue: Elsevier
Publication date: 01/01/2016
Field of study

Middlesex University Research Repository

Maintaining regularity and generalization in data using the minimum description length principle and genetic algorithm: Case of grammatical inference

Author: Chaudhary Ankit
Kendall Graham
Mehrotra Deepti
Pandey Hari Mohan
Publication venue: IRL @ UMSL
Publication date: 01/12/2016
Field of study

University of Missouri, St. Louis

The Unsupervised Acquisition of a Lexicon from Continuous Speech

Author: de Marcken Carl
Publication venue
Publication date: 01/01/1995
Field of study

We present an unsupervised learning algorithm that acquires a natural-language lexicon from raw speech. The algorithm is based on the optimal encoding of symbol sequences in an MDL framework, and uses a hierarchical representation of language that overcomes many of the problems that have stymied previous grammar-induction procedures. The forward mapping from symbol sequences to the speech stream is modeled using features based on articulatory gestures. We present results on the acquisition of lexicons and language models from raw speech, text, and phonetic transcripts, and demonstrate that our algorithm compares very favorably to other reported results with respect to segmentation performance and statistical efficiency.Comment: 27 page technical repor

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Structure induction by lossless graph compression

Author: Peshkin Leonid
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

This work is motivated by the necessity to automate the discovery of structure in vast and evergrowing collection of relational data commonly represented as graphs, for example genomic networks. A novel algorithm, dubbed Graphitour, for structure induction by lossless graph compression is presented and illustrated by a clear and broadly known case of nested structure in a DNA molecule. This work extends to graphs some well established approaches to grammatical inference previously applied only to strings. The bottom-up graph compression problem is related to the maximum cardinality (non-bipartite) maximum cardinality matching problem. The algorithm accepts a variety of graph types including directed graphs and graphs with labeled nodes and arcs. The resulting structure could be used for representation and classification of graphs.Comment: 10 pages, 7 figures, 2 tables published in Proceedings of the Data Compression Conference, 200

arXiv.org e-Print Archive

CiteSeerX

Crossref

A syntactic approach to robot imitation learning using probabilistic activity grammars

Author: Demiris Y
Kim T-K
Lee K
Su Y
Publication venue: 'Elsevier BV'
Publication date: 01/12/2013
Field of study

Spiral - Imperial College Digital Repository