1,386 research outputs found
Recalibrating machine learning for social biases: demonstrating a new methodology through a case study classifying gender biases in archival documentation
This thesis proposes a recalibration of Machine Learning for social biases to minimize harms from existing approaches and practices in the field. Prioritizing quality over quantity, accuracy over efficiency, representativeness over convenience, and situated thinking over universal thinking, the thesis demonstrates an alternative approach to creating Machine Learning models. Drawing on GLAM, the Humanities, the Social Sciences, and Design, the thesis focuses on understanding and communicating biases in a specific use case. 11,888 metadata descriptions from the University of Edinburgh Heritage Collections' Archives catalog were manually annotated for gender biases and text classification models were then trained on the resulting dataset of 55,260 annotations. Evaluations of the models' performance demonstrates that annotating gender biases can be automated; however, the subjectivity of bias as a concept complicates the generalizability of any one approach.
The contributions are: (1) an interdisciplinary and participatory Bias-Aware Methodology, (2) a Taxonomy of Gendered and Gender Biased Language, (3) data annotated for gender biased language, (4) gender biased text classification models, and (5) a human-centered approach to model evaluation. The contributions have implications for Machine Learning, demonstrating how bias is inherent to all data and models; more specifically for Natural Language Processing, providing an annotation taxonomy, annotated datasets and classification models for analyzing gender biased language at scale; for the Gallery, Library, Archives, and Museum sector, offering guidance to institutions seeking to reconcile with histories of marginalizing communities through their documentation practices; and for historians, who utilize cultural heritage documentation to study and interpret the past. Through a real-world application of the Bias-Aware Methodology in a case study, the thesis illustrates the need to shift away from removing social biases and towards acknowledging them, creating data and models that surface the uncertainty and multiplicity characteristic of human societies
Evaluation of ChatGPT Family of Models for Biomedical Reasoning and Classification
Recent advances in large language models (LLMs) have shown impressive ability
in biomedical question-answering, but have not been adequately investigated for
more specific biomedical applications. This study investigates the performance
of LLMs such as the ChatGPT family of models (GPT-3.5s, GPT-4) in biomedical
tasks beyond question-answering. Because no patient data can be passed to the
OpenAI API public interface, we evaluated model performance with over 10000
samples as proxies for two fundamental tasks in the clinical domain -
classification and reasoning. The first task is classifying whether statements
of clinical and policy recommendations in scientific literature constitute
health advice. The second task is causal relation detection from the biomedical
literature. We compared LLMs with simpler models, such as bag-of-words (BoW)
with logistic regression, and fine-tuned BioBERT models. Despite the excitement
around viral ChatGPT, we found that fine-tuning for two fundamental NLP tasks
remained the best strategy. The simple BoW model performed on par with the most
complex LLM prompting. Prompt engineering required significant investment.Comment: 28 pages, 2 tables and 4 figures. Submitting for revie
Linguistically inspired roadmap for building biologically reliable protein language models
Deep neural-network-based language models (LMs) are increasingly applied to
large-scale protein sequence data to predict protein function. However, being
largely black-box models and thus challenging to interpret, current protein LM
approaches do not contribute to a fundamental understanding of
sequence-function mappings, hindering rule-based biotherapeutic drug
development. We argue that guidance drawn from linguistics, a field specialized
in analytical rule extraction from natural language data, can aid with building
more interpretable protein LMs that are more likely to learn relevant
domain-specific rules. Differences between protein sequence data and linguistic
sequence data require the integration of more domain-specific knowledge in
protein LMs compared to natural language LMs. Here, we provide a
linguistics-based roadmap for protein LM pipeline choices with regard to
training data, tokenization, token embedding, sequence embedding, and model
interpretation. Incorporating linguistic ideas into protein LMs enables the
development of next-generation interpretable machine-learning models with the
potential of uncovering the biological mechanisms underlying sequence-function
relationships.Comment: 27 pages, 4 figure
Talking about personal recovery in bipolar disorder: Integrating health research, natural language processing, and corpus linguistics to analyse peer online support forum posts
Background: Personal recovery, ‘living a satisfying, hopeful and contributing lifeeven with the limitations caused by the illness’ (Anthony, 1993) is of particular value in bipolar disorder where symptoms often persist despite treatment. So far, personal recovery has only been studied in researcher-constructed environments (interviews, focus groups). Support forum posts can serve as a complementary naturalistic data source. Objective: The overarching aim of this thesis was to study personal recovery experiences that people living with bipolar disorder have shared in online support forums through integrating health research, NLP, and corpus linguistics in a mixed methods approach within a pragmatic research paradigm, while considering ethical issues and involving people with lived experience. Methods: This mixed-methods study analysed: 1) previous qualitative evidence on personal recovery in bipolar disorder from interviews and focus groups 2) who self-reports a bipolar disorder diagnosis on the online discussion platform Reddit 3) the relationship of mood and posting in mental health-specific Reddit forums (subreddits) 4) discussions of personal recovery in bipolar disorder subreddits. Results: A systematic review of qualitative evidence resulted in the first framework for personal recovery in bipolar disorder, POETIC (Purpose & meaning, Optimism & hope, Empowerment, Tensions, Identity, Connectedness). Mainly young or middle-aged US-based adults self-report a bipolar disorder diagnosis on Reddit. Of these, those experiencing more intense emotions appear to be more likely to post in mental health support subreddits. Their personal recovery-related discussions in bipolar disorder subreddits primarily focussed on three domains: Purpose & meaning (particularly reproductive decisions, work), Connectedness (romantic relationships, social support), Empowerment (self-management, personal responsibility). Support forum data highlighted personal recovery issues that exclusively or more frequently came up online compared to previous evidence from interviews and focus groups. Conclusion: This project is the first to analyse non-reactive data on personal recovery in bipolar disorder. Indicating the key areas that people focus on in personal recovery when posting freely and the language they use provides a helpful starting point for formal and informal carers to understand the concerns of people diagnosed with bipolar disorder and to consider how best to offer support
Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve
The widespread adoption of large language models (LLMs) makes it important to
recognize their strengths and limitations. We argue that in order to develop a
holistic understanding of these systems we need to consider the problem that
they were trained to solve: next-word prediction over Internet text. By
recognizing the pressures that this task exerts we can make predictions about
the strategies that LLMs will adopt, allowing us to reason about when they will
succeed or fail. This approach - which we call the teleological approach -
leads us to identify three factors that we hypothesize will influence LLM
accuracy: the probability of the task to be performed, the probability of the
target output, and the probability of the provided input. We predict that LLMs
will achieve higher accuracy when these probabilities are high than when they
are low - even in deterministic settings where probability should not matter.
To test our predictions, we evaluate two LLMs (GPT-3.5 and GPT-4) on eleven
tasks, and we find robust evidence that LLMs are influenced by probability in
the ways that we have hypothesized. In many cases, the experiments reveal
surprising failure modes. For instance, GPT-4's accuracy at decoding a simple
cipher is 51% when the output is a high-probability word sequence but only 13%
when it is low-probability. These results show that AI practitioners should be
careful about using LLMs in low-probability situations. More broadly, we
conclude that we should not evaluate LLMs as if they are humans but should
instead treat them as a distinct type of system - one that has been shaped by
its own particular set of pressures.Comment: 50 pages plus 11 page of references and 23 pages of appendice
Workshop Proceedings of the 12th edition of the KONVENS conference
The 2014 issue of KONVENS is even more a forum for exchange: its main topic is the interaction between Computational Linguistics and Information Science, and the synergies such interaction, cooperation and integrated views can produce. This topic at the crossroads of different research traditions which deal with natural language as a container of knowledge, and with methods to extract and manage knowledge that is linguistically represented is close to the heart of many researchers at the Institut für Informationswissenschaft und Sprachtechnologie of Universität Hildesheim: it has long been one of the institute’s research topics, and it has received even more attention over the last few years
- …