4 research outputs found
Chatbots Are Not Reliable Text Annotators
Recent research highlights the significant potential of ChatGPT for text
annotation in social science research. However, ChatGPT is a closed-source
product which has major drawbacks with regards to transparency,
reproducibility, cost, and data protection. Recent advances in open-source (OS)
large language models (LLMs) offer alternatives which remedy these challenges.
This means that it is important to evaluate the performance of OS LLMs relative
to ChatGPT and standard approaches to supervised machine learning
classification. We conduct a systematic comparative evaluation of the
performance of a range of OS LLM models alongside ChatGPT, using both zero- and
few-shot learning as well as generic and custom prompts, with results compared
to more traditional supervised classification models. Using a new dataset of
Tweets from US news media, and focusing on simple binary text annotation tasks
for standard social science concepts, we find significant variation in the
performance of ChatGPT and OS models across the tasks, and that supervised
classifiers consistently outperform both. Given the unreliable performance of
ChatGPT and the significant challenges it poses to Open Science we advise
against using ChatGPT for substantive text annotation tasks in social science
research
Rhetorical effects in illness writing: a coherence-based approach
This thesis uses cognitive-stylistic techniques to analyse rhetorical effects in a collection of non-fiction writing about illness. It draws on a broad range of related disciplines, including discourse analysis and cognitive psychology, and uses these approaches to conduct a close linguistic analysis of the texts analysed. The results of this analysis are linked to existing research in the medical humanities, specifically in relation to illness and narrative. In particular, this thesis describes how readers utilise certain linguistic features in order to construct a coherent mental representation of a text. It argues that certain strategies employed by readers to create these interpretations have rhetorical effects which go beyond coherence building.
To begin, I provide explicit definitions for some of the key terms which feature prominently in this thesis: illness writing; coherence; and rhetoric. Following this, I introduce the corpus of texts which from which are drawn the examples used throughout this thesis. Alongside this, I introduce the specific linguistic features which will be studied in the subsequent analysis chapters. These features are introduced and analysed at increasing levels of linguistic abstraction, from more concrete to less. The analysis begins with a subset of English personal pronouns, before moving on to describe discourse structure in the form of repeating patterns of textual organisation. I then consider the role of external, ‘real-world’ knowledge in the construction of discourse coherence, before demonstrating how this knowledge can be blended to create new, creative ways of thinking about illness. The thesis closes with a summary of these results, along with some suggestions for potential future research. Finally, I conclude with a reflection on the methods and results found in the thesis and point towards their wider applicability in the field of medical humanities more generally.
The original contribution of this thesis is therefore twofold. From a cognitive-stylistic perspective, it contributes to the understanding of the relationship between coherence-building strategies and their rhetorical effects. However, it also aims to contribute to the ongoing work in medical humanities, which seeks to advance our understanding of the lived experience of illness
Recommended from our members
Human feedback makes Large Language Models more human-like
The most recent generation of Large Language Models owes its success not only to scale, but also a novel step in their training: reinforcement learning from human feedback (RLHF). In this study, we assessed the impact that this training regime has on the fit between model and human behavior in regards to linguistic behavior. We evaluated three versions of OpenAI’s GPT-3 davinci – original, instruction-tuned, and RLHF-trained – using psycholinguistic tasks: subject-verb agreement, sentence acceptability, and event knowledge. We then compared their performance to human participants. We found that the RLHF model is significantly more human-like in its answers, including in the errors it commits. Moreover, the uncertainty of the distribution of its output is closely tied with between-subject variation in humans. This suggests that human feedback improves not only the overall quality of LLMs, but also the alignment between their behavior and the linguistic, metalinguistic, and discursive intuitions of humans