Search CORE

2 research outputs found

Combining Corpus-Based Features for Selecting Best Natural Language Sentences

Author: Foaad Khosmood
Robert Levinson
Publication venue
Publication date: 05/03/2020
Field of study

Abstract-Automated paraphrasing of natural language text has many interesting applications from aiding in better translations to generating better and more appropriate style language. In this paper, we are concerned with the problem of picking the best English sentence out of a set of machine generated paraphrase sentences, each designed to express the same content as a human generated original. We present a system of scoring sentences based on examples in large corpora. Specifically, we use the Microsoft Web N-Gram service and the text of the Brown Corpus to extract features from all candidate sentences and compare them against each other. We consider three feature combination methods: A handcrafted decision tree, linear regression and linear powerset regression. We find that while each method has particular strengths, the linear power set regression performs best against our human-evaluated test data

CiteSeerX

Combining Corpus-Based Features for Selecting Best Natural Language Sentences

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref