Do origin and facts identify automatically generated text?

Paramita, M.L.; Preiss, J.

Do origin and facts identify automatically generated text?

Authors: M.L. Paramita
J. Preiss
Publication date: 26 September 2023
Publisher: CEUR-WS.org

Abstract

We present a proof of concept investigating whether native language identification and fact checking information improves a language model (GPT-2) classifier which determines whether a piece of text was written by a human or a machine. Since automatical text generation is trained on writings of many individuals, we hypothesize that there will not be a clear native language for 'the writer' and therefore that a native language identification module can be used in reverse - i.e. when a native language cannot be identified, the probability of automatic generation is higher. Automatic generation is also known to hallucinate, making up content. To this end, we integrate a Wikipedia fact checking module. Both pieces of information are simply added to the input to the GPT-2 classifier, and result in an improvement over its baseline performance in the English language human or generated subtask of the Automated Text Identification (AuTexTification) shared task [1]

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

White Rose Research Online

oai:eprints.whiterose.ac.uk:21...

Last time updated on 03/05/2024