2 research outputs found
T5 meets Tybalt: Author Attribution in Early Modern English Drama Using Large Language Models
Large language models have shown breakthrough potential in many NLP domains.
Here we consider their use for stylometry, specifically authorship
identification in Early Modern English drama. We find both promising and
concerning results; LLMs are able to accurately predict the author of
surprisingly short passages but are also prone to confidently misattribute
texts to specific authors. A fine-tuned t5-large model outperforms all tested
baselines, including logistic regression, SVM with a linear kernel, and cosine
delta, at attributing small passages. However, we see indications that the
presence of certain authors in the model's pre-training data affects predictive
results in ways that are difficult to assess.Comment: Published in CHR 202