Contrasting Linguistic Patterns in Human and LLM-Generated Text

Gómez-Rodríguez, Carlos; Muñoz-Ortiz, Alberto; Vilares, David

Contrasting Linguistic Patterns in Human and LLM-Generated Text

Authors: Carlos Gómez-Rodríguez
Alberto Muñoz-Ortiz
David Vilares
Publication date: 17 August 2023
Publisher

Abstract

We conduct a quantitative analysis contrasting human-written English news text with comparable large language model (LLM) output from 4 LLMs from the LLaMa family. Our analysis spans several measurable linguistic dimensions, including morphological, syntactic, psychometric and sociolinguistic aspects. The results reveal various measurable differences between human and AI-generated texts. Among others, human texts exhibit more scattered sentence length distributions, a distinct use of dependency and constituent types, shorter constituents, and more aggressive emotions (fear, disgust) than LLM-generated texts. LLM outputs use more numbers, symbols and auxiliaries (suggesting objective language) than human texts, as well as more pronouns. The sexist bias prevalent in human text is also expressed by LLMs

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2308.09067

Last time updated on 24/08/2023