10,967 research outputs found
Inducing Probabilistic Grammars by Bayesian Model Merging
We describe a framework for inducing probabilistic grammars from corpora of
positive samples. First, samples are {\em incorporated} by adding ad-hoc rules
to a working grammar; subsequently, elements of the model (such as states or
nonterminals) are {\em merged} to achieve generalization and a more compact
representation. The choice of what to merge and when to stop is governed by the
Bayesian posterior probability of the grammar given the data, which formalizes
a trade-off between a close fit to the data and a default preference for
simpler models (`Occam's Razor'). The general scheme is illustrated using three
types of probabilistic grammars: Hidden Markov models, class-based -grams,
and stochastic context-free grammars.Comment: To appear in Grammatical Inference and Applications, Second
International Colloquium on Grammatical Inference; Springer Verlag, 1994. 13
page
Unsupervised Statistical Learning of Context-free Grammar
In this paper, we address the problem of inducing (weighted) context-free grammar (WCFG) on data given.
The induction is performed by using a new model of grammatical inference, i.e., weighted Grammar-based
Classifier System (wGCS). wGCS derives from learning classifier systems and searches grammar structure
using a genetic algorithm and covering. Weights of rules are estimated by using a novelty Inside-Outside
Contrastive Estimation algorithm. The proposed method employs direct negative evidence and learns WCFG
both form positive and negative samples. Results of experiments on three synthetic context-free languages
show that wGCS is competitive with other statistical-based method for unsupervised CFG learning
- …