4,300 research outputs found
GEMINI: A Natural Language System for Spoken-Language Understanding
Gemini is a natural language understanding system developed for spoken
language applications. The paper describes the architecture of Gemini, paying
particular attention to resolving the tension between robustness and
overgeneration. Gemini features a broad-coverage unification-based grammar of
English, fully interleaved syntactic and semantic processing in an all-paths,
bottom-up parser, and an utterance-level parser to find interpretations of
sentences that might not be analyzable as complete sentences. Gemini also
includes novel components for recognizing and correcting grammatical
disfluencies, and for doing parse preferences. This paper presents a
component-by-component view of Gemini, providing detailed relevant measurements
of size, efficiency, and performance.Comment: 8 pages, postscrip
Treebank-based acquisition of LFG parsing resources for French
Motivated by the expense in time and other resources to produce hand-crafted grammars, there has been increased interest in automatically obtained wide-coverage grammars from treebanks for natural language processing. In particular, recent years have seen the growth in interest in automatically obtained deep resources that can represent information absent from simple CFG-type structured treebanks
and which are considered to produce more language-neutral linguistic representations, such as dependency syntactic trees. As is often the case in early pioneering work on natural language processing, English has provided the focus of first efforts towards acquiring deep-grammar resources, followed by successful treatments of, for example, German, Japanese, Chinese and Spanish. However, no comparable large-scale automatically acquired deep-grammar resources have been obtained for French to date. The goal of this paper is to present the application of treebank-based language acquisition to the case of French. We show that with modest changes to the established parsing architectures, encouraging results can be obtained for French, with a best dependency structure f-score of 86.73%
On Parsing CHILDES
Research on child language acquisition would benefit from the availability of a large body of syntactically parsed utterances between parents and children. We consider the problem of generating such a ``treebank'' from the CHILDES corpus, which currently contains primarily orthographically transcribed speech tagged for lexical category
Automatic Extraction of Subcategorization from Corpora
We describe a novel technique and implemented system for constructing a
subcategorization dictionary from textual corpora. Each dictionary entry
encodes the relative frequency of occurrence of a comprehensive set of
subcategorization classes for English. An initial experiment, on a sample of 14
verbs which exhibit multiple complementation patterns, demonstrates that the
technique achieves accuracy comparable to previous approaches, which are all
limited to a highly restricted set of subcategorization classes. We also
demonstrate that a subcategorization dictionary built with the system improves
the accuracy of a parser by an appreciable amount.Comment: 8 pages; requires aclap.sty. To appear in ANLP-9
Parallel Distributed Grammar Engineering for Practical Applications
Based on a detailed case study of parallel grammar development distributed across two sites, we review some of the requirements for regression testing in grammar engineering, summarize our approach to systematic competence and performance profiling, and discuss our experience with grammar development for a commercial application. If possible, the workshop presentation will be organized around a software demonstration
- ā¦