370 research outputs found
Can Subcategorisation Probabilities Help a Statistical Parser?
Research into the automatic acquisition of lexical information from corpora
is starting to produce large-scale computational lexicons containing data on
the relative frequencies of subcategorisation alternatives for individual
verbal predicates. However, the empirical question of whether this type of
frequency information can in practice improve the accuracy of a statistical
parser has not yet been answered. In this paper we describe an experiment with
a wide-coverage statistical grammar and parser for English and
subcategorisation frequencies acquired from ten million words of text which
shows that this information can significantly improve parse accuracy.Comment: 9 pages, uses colacl.st
unsupervised named entity recognition using syntactic and semantic contextual evidence
Proper nouns form an open class, making the incompleteness of manually or automatically learned classification rules an obvious problem. The purpose of this paper is twofold: first, to suggest the use of a complementary "backup" method to increase the robustness of any hand-crafted or machine-learning-based NE tagger; and second, to explore the effectiveness of using more fine-grained evidence—namely, syntactic and semantic contextual knowledge—in classifying NEs
A Pattern Matching method for finding Noun and Proper Noun Translations from Noisy Parallel Corpora
We present a pattern matching method for compiling a bilingual lexicon of
nouns and proper nouns from unaligned, noisy parallel texts of
Asian/Indo-European language pairs. Tagging information of one language is
used. Word frequency and position information for high and low frequency words
are represented in two different vector forms for pattern matching. New anchor
point finding and noise elimination techniques are introduced. We obtained a
73.1\% precision. We also show how the results can be used in the compilation
of domain-specific noun phrases.Comment: 8 pages, uuencoded compressed postscript file. To appear in the
Proceedings of the 33rd AC
- …