78 research outputs found
Evolving linguistic divergence on polarizing social media
Language change is influenced by many factors, but often starts from
synchronic variation, where multiple linguistic patterns or forms coexist, or
where different speech communities use language in increasingly different ways.
Besides regional or economic reasons, communities may form and segregate based
on political alignment. The latter, referred to as political polarization, is
of growing societal concern across the world. Here we map and quantify
linguistic divergence across the partisan left-right divide in the United
States, using social media data. We develop a general methodology to delineate
(social) media users by their political preference, based on which (potentially
biased) news media accounts they do and do not follow on a given platform. Our
data consists of 1.5M short posts by 10k users (about 20M words) from the
social media platform Twitter (now "X"). Delineating this sample involved
mining the platform for the lists of followers (n=422M) of 72 large news media
accounts. We quantify divergence in topics of conversation and word
frequencies, messaging sentiment, and lexical semantics of words and emoji. We
find signs of linguistic divergence across all these aspects, especially in
topics and themes of conversation, in line with previous research. While US
American English remains largely intelligible within its large speech
community, our findings point at areas where miscommunication may eventually
arise given ongoing polarization and therefore potential linguistic divergence.
Our methodology - combining data mining, lexicostatistics, machine learning,
large language models and a systematic human annotation approach - is largely
language and platform agnostic. In other words, while we focus here on US
political divides and US English, the same approach is applicable to other
countries, languages, and social media platforms
Mappings between linguistic sound and motion
This paper provides an overview of the possible function of non-arbitrary mappings between linguistic form and meaning, and presents new empirical evidence showing that shared cross-modal associations may underlie motion sound-symbolism in particular. In terms of function, several lines of empirical and theoretical evidence suggest that non-arbitrary form-meaning connections could have played a crucial role in lexical emergence during language evolution. Furthermore, the persistence of such non-arbitrariness in some areas of modern language may also be highly functional, as recent data has shown that non-arbitrary forms may help to bootstrap learning in children (Imai, Kita, Nagumo, and Okada, 2008) and adults (Nielsen and Rendall, 2012). Given the functional role of these non-arbitrary mappings between linguistic form and meaning, this paper describes new experimental data demonstrating shared mappings between non-sense words and visual motion using a direct matching task. Participants were given nonsense words that varied in terms of their voicing, reduplication, and vowel quality, and asked to change the movement of a ball to match a given word. Results show that back vowels are mapped onto slower speeds, and consonant reduplication with vowel alternation is mapped onto faster speeds. These results show a shared cross-modal association between linguistic sound and motion, which is likely leveraged in sound-symbolic systems found in natural language
Shared cross-modal associations and the emergence of the lexicon
This thesis centres around a sensory theory of protolanguage emergence, or STP. The
STP proposes that shared biases to make associations between sensory modalities provided
the basis for the emergence of a shared protolinguistic lexicon. Crucially, this
lexicon would have been grounded in our perceptual systems, and thus fundamentally
non-arbitrary. The foundation of such a lexicon lies in shared cross-modal associations:
biases shared among language users to map properties in one modality (e.g.,
visual size) onto another (e.g., vowel sounds). While there is broad evidence that we
make associations between a variety of modalities (Spence, 2011), this thesis focuses
specifically on associations involving linguistic sound, arguing that these associations
would have been most important in language emergence. Early linguistic utterances,
by virtue of their grounding in shared cross-modal associations, could be formed and
understood with high mutual intelligibility.
The first chapter of the thesis will outline this theory in detail, addressing the nature
of the proposed protolanguage system, arguing for the utility of non-arbitrariness
at the point of language emergence, and proposing evidence for the likely transition
form a non-arbitrary protolanguage to the predominantly arbitrary language systems
we observe today. The remainder of the thesis will focus on providing empirical evidence
to support this theory in two ways: (i) presenting experimental data showing
evidence of shared associations between linguistic sound and other modalities, and (ii)
providing evidence that such associations are evident cross-linguistically, despite the
predominantly arbitrary nature of modern languages.
Chapter two will examine well-documented associations between vowel quality
and physical size (e.g., /i/ is small, and /a/ is large; Sapir, 1929). This chapter
presents a new experimental approach which fails to find robust associations between
vowel quality and size absent the use of a forced choice paradigm. Chapter three
turns to associations between linguistic sound and shape angularity, taking a critical
perspective on the classic takete/maluma experiment (Kohler, 1929). New empirical
evidence shows that the acquisition of visual word forms plays a highly influential role
in mediating associations between linguistic sound and angularity, but that associations
between linguistic sound and visual form also play a minor role in auditory tasks.
Chapter four will examine a relatively unexplored modality: taste. A simple survey
which asks participants to choose non-words to match representative tastes shows that
certain linguistic sounds are preferred for certain food items. In a more detailed study,
we use a more direct perceptual matching task with actual tastants and synthesises
speech sounds, further showing that people make robust shared associations between
linguistic sound and taste. Chapter five returns to the visual modality, considering
previously unexmained associations between linguistic sound and motion, specifically
the feature of speed. This study demonstrates that people do make robust associations
between the two modalities, particularly for vowel quality.
Chapter six will aim to take a different empirical approach, considering non-arbitrariness
in natural language. Motivated by the experimental data from the previous chapters,
we turn to corpus analyses to assess the presence of non-arbitrariness in natural language
which concurs with behavioural data showing linguistic cross-modal associations.
First, a corpus analysis of taste synonyms in English shows small but significant
correlations between form and meaning. With the goal of addressing the universality
of specific sound-meaning associations, we examine cross-linguistic corpora of taste
and motion terms, showing that particular phonological features tend to connect to
certain tastes and types of motion across genetically and geographically distinct languages.
Lastly, the thesis will conclude by considering the STP in light of the empirical
evidence presented, and suggesting possible future empirical directions to explore the
theory more broadly
Double-blind reviewing and gender biases at EvoLang conferences
A previous study of reviewing at the Evolution of Language conferences found effects that suggested that gender bias against female authors was alleviated under double-blind review at EvoLang 11. We update this analysis in two specific ways. First, we add data from the most recent EvoLang 12 conference, providing a comprehensive picture of the conference over five iterations. Like EvoLang 11, EvoLang 12 used double-blind review, but EvoLang 12 showed no significant difference in review scores between genders. We discuss potential explanations for why there was a strong effect in EvoLang 11, which is largely absent in EvoLang 12. These include testing whether readability differs between genders, though we find no evidence to support this. Although gender differences seem to have declined for EvoLang 12, we suggest that double-blind review provides a more equitable evaluation process
Cross-modal associations and synaesthesia:Categorical perception and structure in vowel-colour mappings in a large online sample
We report associations between vowel sounds, graphemes, and colours collected online from over 1000 Dutch speakers. We provide open materials including a Python implementation of the structure measure, and code for a single page web application to run simple cross-modal tasks. We also provide a full dataset of colour-vowel associations from 1164 participants, including over 200 synaesthetes identified using consistency measures. Our analysis reveals salient patterns in cross-modal associations, and introduces a novel measure of isomorphism in cross-modal mappings. We find that while acoustic features of vowels significantly predict certain mappings (replicating prior work), both vowel phoneme category and grapheme category are even better predictors of colour choice. Phoneme category is the best predictor of colour choice overall, pointing to the importance of phonological representations in addition to acoustic cues. Generally, high/front vowels are lighter, more green, and more yellow than low/back vowels. Synaesthetes respond more strongly on some dimensions, choosing lighter and more yellow colours for high and mid front vowels than non-synaesthetes. We also present a novel measure of cross-modal mappings adapted from ecology, which uses a simulated distribution of mappings to measure the extent to which participants' actual mappings are structured isomorphically across modalities. Synaesthetes have mappings that tend to be more structured than non-synaesthetes, and more consistent colour choices across trials correlate with higher structure scores. Nevertheless, the large majority (~70%) of participants produce structured mappings, indicating that the capacity to make isomorphically structured mappings across distinct modalities is shared to a large extent, even if the exact nature of mappings varies across individuals. Overall, this novel structure measure suggests a distribution of structured cross-modal association in the population, with synaesthetes on one extreme and participants with unstructured associations on the other
The regularity game:Investigating linguistic rule dynamics in a population of interacting agents
Abstract Rules are an efficient feature of natural languages which allow speakers to use a finite set of instructions to generate a virtually infinite set of utterances. Yet, for many regular rules, there are irregular exceptions. There has been lively debate in cognitive science about how individual learners acquire rules and exceptions; for example, how they learn the past tense of preach is preached, but for teach it is taught. However, for most population or language-level models of language structure, particularly from the perspective of language evolution, the goal has generally been to examine how languages evolve stable structure, and neglects the fact that in many cases, languages exhibit exceptions to structural rules. We examine the dynamics of regularity and irregularity across a population of interacting agents to investigate how, for example, the irregular teach coexists beside the regular preach in a dynamic language system. Models show that in the absence of individual biases towards either regularity or irregularity, the outcome of a system is determined entirely by the initial condition. On the other hand, in the presence of individual biases, rule systems exhibit frequency dependent patterns in regularity reminiscent of patterns found in natural language. We implement individual biases towards regularity in two ways: through ‘child’ agents who have a preference to generalise using the regular form, and through a memory constraint wherein an agent can only remember an irregular form for a finite time period. We provide theoretical arguments for the prediction of a critical frequency below which irregularity cannot persist in terms of the duration of the finite time period which constrains agent memory. Further, within our framework we also find stable irregularity, arguably a feature of most natural languages not accounted for in many other cultural models of language structure
General three state model with biased population replacement:Analytical solution and application to language dynamics
Empirical evidence shows that the rate of irregular usage of English verbs
exhibits discontinuity as a function of their frequency: the most frequent
verbs tend to be totally irregular. We aim to qualitatively understand the
origin of this feature by studying simple agent--based models of language
dynamics, where each agent adopts an inflectional state for a verb and may
change it upon interaction with other agents. At the same time, agents are
replaced at some rate by new agents adopting the regular form. In models with
only two inflectional states (regular and irregular), we observe that either
all verbs regularize irrespective of their frequency, or a continuous
transition occurs between a low frequency state where the lemma becomes fully
regular, and a high frequency one where both forms coexist. Introducing a third
(mixed) state, wherein agents may use either form, we find that a third,
qualitatively different behavior may emerge, namely, a discontinuous transition
in frequency. We introduce and solve analytically a very general class of
three--state models that allows us to fully understand these behaviors in a
unified framework. Realistic sets of interaction rules, including the
well-known Naming Game (NG) model, result in a discontinuous transition, in
agreement with recent empirical findings. We also point out that the
distinction between speaker and hearer in the interaction has no effect on the
collective behavior. The results for the general three--state model, although
discussed in terms of language dynamics, are widely applicable.Comment: 14 pages, 6 figures. Final published versio
- …