2 research outputs found

    Combining Corpus-Based Features for Selecting Best Natural Language Sentences

    No full text
    Abstract-Automated paraphrasing of natural language text has many interesting applications from aiding in better translations to generating better and more appropriate style language. In this paper, we are concerned with the problem of picking the best English sentence out of a set of machine generated paraphrase sentences, each designed to express the same content as a human generated original. We present a system of scoring sentences based on examples in large corpora. Specifically, we use the Microsoft Web N-Gram service and the text of the Brown Corpus to extract features from all candidate sentences and compare them against each other. We consider three feature combination methods: A handcrafted decision tree, linear regression and linear powerset regression. We find that while each method has particular strengths, the linear power set regression performs best against our human-evaluated test data

    Combining Corpus-Based Features for Selecting Best Natural Language Sentences

    No full text
    corecore