14,010 research outputs found
Treebank-based acquisition of LFG parsing resources for French
Motivated by the expense in time and other resources to produce hand-crafted grammars, there has been increased interest in automatically obtained wide-coverage grammars from treebanks for natural language processing. In particular, recent years have seen the growth in interest in automatically obtained deep resources that can represent information absent from simple CFG-type structured treebanks
and which are considered to produce more language-neutral linguistic representations, such as dependency syntactic trees. As is often the case in early pioneering work on natural language processing, English has provided the focus of first efforts towards acquiring deep-grammar resources, followed by successful treatments of, for example, German, Japanese, Chinese and Spanish. However, no comparable large-scale automatically acquired deep-grammar resources have been obtained for French to date. The goal of this paper is to present the application of treebank-based language acquisition to the case of French. We show that with modest changes to the established parsing architectures, encouraging results can be obtained for French, with a best dependency structure f-score of 86.73%
Corpus Annotation for Parser Evaluation
We describe a recently developed corpus annotation scheme for evaluating
parsers that avoids shortcomings of current methods. The scheme encodes
grammatical relations between heads and dependents, and has been used to mark
up a new public-domain corpus of naturally occurring English text. We show how
the corpus can be used to evaluate the accuracy of a robust parser, and relate
the corpus to extant resources.Comment: 7 pages, LaTeX (uses eaclap.sty
Optimality Theory as a Framework for Lexical Acquisition
This paper re-investigates a lexical acquisition system initially developed
for French.We show that, interestingly, the architecture of the system
reproduces and implements the main components of Optimality Theory. However, we
formulate the hypothesis that some of its limitations are mainly due to a poor
representation of the constraints used. Finally, we show how a better
representation of the constraints used would yield better results
New Methods, Current Trends and Software Infrastructure for NLP
The increasing use of `new methods' in NLP, which the NeMLaP conference
series exemplifies, occurs in the context of a wider shift in the nature and
concerns of the discipline. This paper begins with a short review of this
context and significant trends in the field. The review motivates and leads to
a set of requirements for support software of general utility for NLP research
and development workers. A freely-available system designed to meet these
requirements is described (called GATE - a General Architecture for Text
Engineering). Information Extraction (IE), in the sense defined by the Message
Understanding Conferences (ARPA \cite{Arp95}), is an NLP application in which
many of the new methods have found a home (Hobbs \cite{Hob93}; Jacobs ed.
\cite{Jac92}). An IE system based on GATE is also available for research
purposes, and this is described. Lastly we review related work.Comment: 12 pages, LaTeX, uses nemlap.sty (included
- …