124 research outputs found
Stochastic phonological grammars and acceptability
In foundational works of generative phonology it is claimed that subjects can
reliably discriminate between possible but non-occurring words and words that
could not be English. In this paper we examine the use of a probabilistic
phonological parser for words to model experimentally-obtained judgements of
the acceptability of a set of nonsense words. We compared various methods of
scoring the goodness of the parse as a predictor of acceptability. We found
that the probability of the worst part is not the best score of acceptability,
indicating that classical generative phonology and Optimality Theory miss an
important fact, as these approaches do not recognise a mechanism by which the
frequency of well-formed parts may ameliorate the unacceptability of
low-frequency parts. We argue that probabilistic generative grammars are
demonstrably a more psychologically realistic model of phonological competence
than standard generative phonology or Optimality Theory.Comment: compressed postscript, 8 pages, 1 figur
Decoding climate agreement: a graph neural network-based approach to understanding climate dynamics
This paper presents the ClimateSent-GAT
Model, a novel approach that combines Graph
Attention Networks (GATs) with natural language processing techniques to accurately identify and predict disagreements within Reddit
comment-reply pairs. Our model classifies disagreements into three categories: agree, disagree, and neutral. Leveraging the inherent
graph structure of Reddit comment-reply pairs,
the model significantly outperforms existing
benchmarks by capturing complex interaction
patterns and sentiment dynamics. This research
advances graph-based NLP methodologies and
provides actionable insights for policymakers
and educators in climate science communication
Not wacky vs. definitely wacky: A study of scalar adverbs in pretrained language models
Vector space models of word meaning all share the assumption that words
occurring in similar contexts have similar meanings. In such models, words that
are similar in their topical associations but differ in their logical force
tend to emerge as semantically close, creating well-known challenges for NLP
applications that involve logical reasoning. Modern pretrained language models,
such as BERT, RoBERTa and GPT-3 hold the promise of performing better on
logical tasks than classic static word embeddings. However, reports are mixed
about their success. In the current paper, we advance this discussion through a
systematic study of scalar adverbs, an under-explored class of words with
strong logical force. Using three different tasks, involving both naturalistic
social media data and constructed examples, we investigate the extent to which
BERT, RoBERTa, GPT-2 and GPT-3 exhibit general, human-like, knowledge of these
common words. We ask: 1) Do the models distinguish amongst the three semantic
categories of MODALITY, FREQUENCY and DEGREE? 2) Do they have implicit
representations of full scales from maximally negative to maximally positive?
3) How do word frequency and contextual factors impact model performance? We
find that despite capturing some aspects of logical meaning, the models fall
far short of human performance.Comment: Published in BlackBoxNLP workshop, EMNLP 202
Not wacky vs. definitely wacky: a study of scalar adverbs in pretrained language models
Vector-space models of word meaning all assume that words occurring in similar contexts have similar meanings. Words that are similar in their topical associations but differ in their logical force tend to emerge as semantically close – creating well-known challenges for NLP applications that involve logical reasoning. Pretrained language models such as BERT, RoBERTa, GPT-2, and GPT-3 hold the promise of performing better on logical tasks than classic static word embeddings. However, reports are mixed about their success. Here, we advance this discussion through a systematic study of scalar adverbs, an under-explored class of words with strong logical force. Using three different tasks involving both naturalistic social media data and constructed examples, we investigate the extent to which BERT, RoBERTa, GPT-2 and GPT-3 exhibit knowledge of these common words. We ask: 1) Do the models distinguish amongst the three semantic categories of MODALITY, FREQUENCY and DEGREE? 2) Do they have implicit representations of full scales from maximally negative to maximally positive? 3) How do word frequency and contextual factors impact model performance? We find that despite capturing some aspects of logical meaning, the models still have obvious shortfalls
Temporal adaptation of BERT and performance on downstream document classification: insights from social media
Language use differs between domains and even within a domain, language use changes over time. For pre-trained language models like BERT, domain adaptation through continued pre-training has been shown to improve performance on in-domain downstream tasks. In this article, we investigate whether temporal adaptation can bring additional benefits. For this purpose, we introduce a corpus of social media comments sampled over three years. It contains unlabelled data for adaptation and evaluation on an upstream masked language modelling task as well as labelled data for fine-tuning and evaluation on a downstream document classification task. We find that temporality matters for both tasks: temporal adaptation improves upstream and temporal fine-tuning downstream task performance. Time-specific models generally perform better on past than on future test sets, which matches evidence on the bursty usage of topical words. However, adapting BERT to time and domain does not improve performance on the downstream task over only adapting to domain. Token-level analysis shows that temporal adaptation captures event-driven changes in language use in the downstream task, but not those changes that are actually relevant to task performance. Based on our findings, we discuss when temporal adaptation may be more effective
Recommended from our members
The Meaning of Intonational Contours in the Interpretation of Discourse
Recent investigations of the contribution that intonation makes to overall utterance and discourse interpretation promise new sources of information for the investigation of long-time concerns in NLP. In Hirschberg & Pierrehumber 1986 we proposed that intonational features such as phrasing, accent placement, pitch range, and tune represent important sources of information about the attentional and intentional structures of discourse. In this paper we examine the particular contribution of choice of tune, or intonational contour, to discourse interpretation
Predicting COVID-19 cases using Reddit posts and other online resources
This paper evaluates the ability to predict COVID-19 caseloads in local areas using the text of geographically specific subreddits, in conjunction with other features. The problem is constructed as a binary classification task on whether the caseload change exceeds a threshold or not. We find that including Reddit features, alongside other informative resources, improves the models' performance in predicting COVID-19 cases. On top of this, we show that exclusive use of Reddit features can act as a strong alternative data source for predicting a short-term rise in caseload due to its strong performance and the fact that it is readily available and updates instantaneously
Probing large language models for scalar adjective lexical semantics and scalar diversity pragmatics
Scalar adjectives pertain to various domain scales and vary in intensity within each scale (e.g. certain is more
intense than likely on the likelihood scale). Scalar implicatures arise from the consideration of alternative
statements which could have been made. They can be triggered by scalar adjectives and require listeners to
reason pragmatically about them. Some scalar adjectives are more likely to trigger scalar implicatures than others.
This phenomenon is referred to as scalar diversity. In this study, we probe different families of Large Language
Models such as GPT-4 for their knowledge of the lexical semantics of scalar adjectives and one specific aspect
of their pragmatics, namely scalar diversity. We find that they encode rich lexical-semantic information about
scalar adjectives. However, the rich lexical-semantic knowledge does not entail a good understanding of scalar
diversity. We also compare current models of different sizes and complexities and find that larger models are
not always better. Finally, we explain our probing results by leveraging linguistic intuitions and model training objectives
- …