5,450 research outputs found
Automatic Extraction of Subcategorization from Corpora
We describe a novel technique and implemented system for constructing a
subcategorization dictionary from textual corpora. Each dictionary entry
encodes the relative frequency of occurrence of a comprehensive set of
subcategorization classes for English. An initial experiment, on a sample of 14
verbs which exhibit multiple complementation patterns, demonstrates that the
technique achieves accuracy comparable to previous approaches, which are all
limited to a highly restricted set of subcategorization classes. We also
demonstrate that a subcategorization dictionary built with the system improves
the accuracy of a parser by an appreciable amount.Comment: 8 pages; requires aclap.sty. To appear in ANLP-9
GEMINI: A Natural Language System for Spoken-Language Understanding
Gemini is a natural language understanding system developed for spoken
language applications. The paper describes the architecture of Gemini, paying
particular attention to resolving the tension between robustness and
overgeneration. Gemini features a broad-coverage unification-based grammar of
English, fully interleaved syntactic and semantic processing in an all-paths,
bottom-up parser, and an utterance-level parser to find interpretations of
sentences that might not be analyzable as complete sentences. Gemini also
includes novel components for recognizing and correcting grammatical
disfluencies, and for doing parse preferences. This paper presents a
component-by-component view of Gemini, providing detailed relevant measurements
of size, efficiency, and performance.Comment: 8 pages, postscrip
Apportioning Development Effort in a Probabilistic LR Parsing System through Evaluation
We describe an implemented system for robust domain-independent syntactic
parsing of English, using a unification-based grammar of part-of-speech and
punctuation labels coupled with a probabilistic LR parser. We present
evaluations of the system's performance along several different dimensions;
these enable us to assess the contribution that each individual part is making
to the success of the system as a whole, and thus prioritise the effort to be
devoted to its further enhancement. Currently, the system is able to parse
around 80% of sentences in a substantial corpus of general text containing a
number of distinct genres. On a random sample of 250 such sentences the system
has a mean crossing bracket rate of 0.71 and recall and precision of 83% and
84% respectively when evaluated against manually-disambiguated analyses.Comment: 10 pages, 1 Postscript figure. To Appear in Proceedings of the
Conference on Empirical Methods in Natural Language Processing, University of
Pennsylvania, May 199
Filling Knowledge Gaps in a Broad-Coverage Machine Translation System
Knowledge-based machine translation (KBMT) techniques yield high quality in
domains with detailed semantic models, limited vocabulary, and controlled input
grammar. Scaling up along these dimensions means acquiring large knowledge
resources. It also means behaving reasonably when definitive knowledge is not
yet available. This paper describes how we can fill various KBMT knowledge
gaps, often using robust statistical techniques. We describe quantitative and
qualitative results from JAPANGLOSS, a broad-coverage Japanese-English MT
system.Comment: 7 pages, Compressed and uuencoded postscript. To appear: IJCAI-9
A workshop on the gathering of information for problem formulation
Issued as Quarterly progress reports no. [1-5], Proceedings and Final contract report, Project no. G-36-651Papers presented at the Workshop/Symposium on Human Computer Interaction, March 26 and 27, 1981, Atlanta, G
Recommended from our members
The role of HG in the analysis of temporal iteration and interaural correlation
- …