2 research outputs found

    Robust Parsing Based on Discourse Information: Completing partial parses of ill-formed sentences on the basis of discourse information

    Full text link
    In a consistent text, many words and phrases are repeatedly used in more than one sentence. When an identical phrase (a set of consecutive words) is repeated in different sentences, the constituent words of those sentences tend to be associated in identical modification patterns with identical parts of speech and identical modifiee-modifier relationships. Thus, when a syntactic parser cannot parse a sentence as a unified structure, parts of speech and modifiee-modifier relationships among morphologically identical words in complete parses of other sentences within the same text provide useful information for obtaining partial parses of the sentence. In this paper, we describe a method for completing partial parses by maintaining consistency among morphologically identical words within the same text as regards their part of speech and their modifiee-modifier relationship. The experimental results obtained by using this method with technical documents offer good prospects for improving the accuracy of sentence analysis in a broad-coverage natural language processing system such as a machine translation system.Comment: To appear in Proceedings of ACL-95, 8 pages, 4 Postscript figures, uses aclap.sty and epsbox.st

    Tricolor DAGs for Machine Translation

    Full text link
    Machine translation (MT) has recently been formulated in terms of constraint-based knowledge representation and unification theories, but it is becoming more and more evident that it is not possible to design a practical MT system without an adequate method of handling mismatches between semantic representations in the source and target languages. In this paper, we introduce the idea of ``information-based'' MT, which is considerably more flexible than interlingual MT or the conventional transfer-based MT.Comment: 8 pages, Kanji text in the original paper has been romanize
    corecore