26 research outputs found
Recommended from our members
Reconciling Abstract Structure and Concrete Data in Statistical Natural-Language Processing
Engineering and Applied Science
Three New Probabilistic Models for Dependency Parsing: An Exploration
After presenting a novel O(n^3) parsing algorithm for dependency grammar, we
develop three contrasting ways to stochasticize it. We propose (a) a lexical
affinity model where words struggle to modify each other, (b) a sense tagging
model where words fluctuate randomly in their selectional preferences, and (c)
a generative model where the speaker fleshes out each word's syntactic and
conceptual structure without regard to the implications for the hearer. We also
give preliminary empirical results from evaluating the three models' parsing
performance on annotated Wall Street Journal training text (derived from the
Penn Treebank). In these results, the generative (i.e., top-down) model
performs significantly better than the others, and does about equally well at
assigning part-of-speech tags.Comment: 6 pages, LaTeX 2.09 packaged with 4 .eps files, also uses colap.sty
and acl.bs
Some Novel Applications of Explanation-Based Learning to Parsing Lexicalized Tree-Adjoining Grammars
In this paper we present some novel applications of Explanation-Based
Learning (EBL) technique to parsing Lexicalized Tree-Adjoining grammars. The
novel aspects are (a) immediate generalization of parses in the training set,
(b) generalization over recursive structures and (c) representation of
generalized parses as Finite State Transducers. A highly impoverished parser
called a ``stapler'' has also been introduced. We present experimental results
using EBL for different corpora and architectures to show the effectiveness of
our approach.Comment: uuencoded postscript fil
Can Subcategorisation Probabilities Help a Statistical Parser?
Research into the automatic acquisition of lexical information from corpora
is starting to produce large-scale computational lexicons containing data on
the relative frequencies of subcategorisation alternatives for individual
verbal predicates. However, the empirical question of whether this type of
frequency information can in practice improve the accuracy of a statistical
parser has not yet been answered. In this paper we describe an experiment with
a wide-coverage statistical grammar and parser for English and
subcategorisation frequencies acquired from ten million words of text which
shows that this information can significantly improve parse accuracy.Comment: 9 pages, uses colacl.st
An alternative conception of tree-adjoining derivation
The precise formulation of derivation for tree-adjoining grammars has important ramifications for a wide variety of uses of the formalism, from syntactic analysis to semantic interpretation and statistical language modeling. We argue that the definition of tree-adjoining derivation must be reformulated in order to manifest the proper linguistic dependencies in derivations. The particular proposal is both precisely characterizable, through a compilation to linear indexed grammars, and computationally operational, by virtue of an efficient algorithm for recognition and parsing.Engineering and Applied Science
Sous-langage d'application et LTAG : le système EGAL
Colloque avec actes et comité de lecture.Nous présentons un système dédié à la conception et au test d'un sous-language d'application pour un système de Dialogue Homme-Machine. EGAL se base sur une grammaire LTAG générale de la langue qui est spécialisée à une application donnée à l'aide d'un corpus d'entraînement. Un double effort a porté premièrement sur la définition d'une méthodologie précise passant par une expérimentation de type Magicien d'Oz pour le recueil des corpus et des estimations de la représentativité du corpus de conception, et, deuxièmement, sur la spécification des composants du système en vue de mettre en oeuvre des outils convivaux, génériques et ouverts