448 research outputs found
Estimation of Stochastic Attribute-Value Grammars using an Informative Sample
We argue that some of the computational complexity associated with estimation
of stochastic attribute-value grammars can be reduced by training upon an
informative subset of the full training set. Results using the parsed Wall
Street Journal corpus show that in some circumstances, it is possible to obtain
better estimation results using an informative sample than when training upon
all the available material. Further experimentation demonstrates that with
unlexicalised models, a Gaussian Prior can reduce overfitting. However, when
models are lexicalised and contain overlapping features, overfitting does not
seem to be a problem, and a Gaussian Prior makes minimal difference to
performance. Our approach is applicable for situations when there are an
infeasibly large number of parses in the training set, or else for when
recovery of these parses from a packed representation is itself computationally
expensive.Comment: 6 pages, 2 figures. Coling 2000, Saarbr\"{u}cken, Germany. pp
586--59
An improved parser for data-oriented lexical-functional analysis
We present an LFG-DOP parser which uses fragments from LFG-annotated
sentences to parse new sentences. Experiments with the Verbmobil and Homecentre
corpora show that (1) Viterbi n best search performs about 100 times faster
than Monte Carlo search while both achieve the same accuracy; (2) the DOP
hypothesis which states that parse accuracy increases with increasing fragment
size is confirmed for LFG-DOP; (3) LFG-DOP's relative frequency estimator
performs worse than a discounted frequency estimator; and (4) LFG-DOP
significantly outperforms Tree-DOP is evaluated on tree structures only.Comment: 8 page
Modeling Graph Languages with Grammars Extracted via Tree Decompositions
Work on probabilistic models of natural language tends to focus on strings and trees, but there is increasing interest in more general graph-shaped structures since they seem to be better suited for representing natural language semantics, ontologies, or other varieties of knowledge structures. However, while there are relatively simple approaches to defining generative models over strings and trees, it has proven more challenging for more general graphs. This paper describes a natural generalization of the n-gram to graphs, making use of Hyperedge Replacement Grammars to define generative models of graph languages.9 page(s
- …