With the rise of generative pre-trained transformer models such as GPT-3,
GPT-NeoX, or OPT, distinguishing human-generated texts from machine-generated
ones has become important. We refined five separate language models to generate
synthetic tweets, uncovering that shallow learning classification algorithms,
like Naive Bayes, achieve detection accuracy between 0.6 and 0.8.
Shallow learning classifiers differ from human-based detection, especially
when using higher temperature values during text generation, resulting in a
lower detection rate. Humans prioritize linguistic acceptability, which tends
to be higher at lower temperature values. In contrast, transformer-based
classifiers have an accuracy of 0.9 and above. We found that using a
reinforcement learning approach to refine our generative models can
successfully evade BERT-based classifiers with a detection accuracy of 0.15 or
less.Comment: This paper has been accepted for the upcoming 57th Hawaii
International Conference on System Sciences (HICSS-57