370 research outputs found

    Can Subcategorisation Probabilities Help a Statistical Parser?

    Full text link
    Research into the automatic acquisition of lexical information from corpora is starting to produce large-scale computational lexicons containing data on the relative frequencies of subcategorisation alternatives for individual verbal predicates. However, the empirical question of whether this type of frequency information can in practice improve the accuracy of a statistical parser has not yet been answered. In this paper we describe an experiment with a wide-coverage statistical grammar and parser for English and subcategorisation frequencies acquired from ten million words of text which shows that this information can significantly improve parse accuracy.Comment: 9 pages, uses colacl.st

    unsupervised named entity recognition using syntactic and semantic contextual evidence

    Get PDF
    Proper nouns form an open class, making the incompleteness of manually or automatically learned classification rules an obvious problem. The purpose of this paper is twofold: first, to suggest the use of a complementary "backup" method to increase the robustness of any hand-crafted or machine-learning-based NE tagger; and second, to explore the effectiveness of using more fine-grained evidence—namely, syntactic and semantic contextual knowledge—in classifying NEs

    A Pattern Matching method for finding Noun and Proper Noun Translations from Noisy Parallel Corpora

    Full text link
    We present a pattern matching method for compiling a bilingual lexicon of nouns and proper nouns from unaligned, noisy parallel texts of Asian/Indo-European language pairs. Tagging information of one language is used. Word frequency and position information for high and low frequency words are represented in two different vector forms for pattern matching. New anchor point finding and noise elimination techniques are introduced. We obtained a 73.1\% precision. We also show how the results can be used in the compilation of domain-specific noun phrases.Comment: 8 pages, uuencoded compressed postscript file. To appear in the Proceedings of the 33rd AC
    • …
    corecore