Abstract

Related dataset is at https://doi.org/10.7302/bewj-qx93 and also listed in the dc.relation field of the full item record.Research on academic integrity has identified online paraphrasing tools as a severe threat to the effectiveness of plagiarism detection systems. To enable the automated identification of machine-paraphrased text, we make three contributions. First, we evaluate the effectiveness of six prominent word embedding models in combination with five classifiers for distinguishing human-written from machine-paraphrased text. The best performing classification approach achieves an accuracy of 99.0% for documents and 83.4% for paragraphs. Second, we show that the best approach outperforms human experts and established plagiarism detection systems for these classification tasks. Third, we provide a Web application that uses the best performing classification approach to indicate whether a text underwent machine-paraphrasing. The data and code of our study are openly available.Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/152346/1/Foltynek2020_Paraphrase_Detection.pdfDescription of Foltynek2020_Paraphrase_Detection.pdf : Foltynek2020_Paraphrase_Detectio

    Similar works

    Full text

    thumbnail-image

    Available Versions

    Last time updated on 10/08/2021