10 research outputs found
Evolving linguistic divergence on polarizing social media
Language change is influenced by many factors, but often starts from synchronic variation, where multiple linguistic patterns or forms coexist, or where different speech communities use language in increasingly different ways. Besides regional or economic reasons, communities may form and segregate based on political alignment. The latter, referred to as political polarization, is of growing societal concern across the world. Here we map and quantify linguistic divergence across the partisan left-right divide in the United States, using social media data. We develop a general methodology to delineate (social) media users by their political preference, based on which (potentially biased) news media accounts they do and do not follow on a given platform. Our data consists of 1.5M short posts by 10k users (about 20M words) from the social media platform Twitter (now “X”). Delineating this sample involved mining the platform for the lists of followers (n = 422M) of 72 large news media accounts. We quantify divergence in topics of conversation and word frequencies, messaging sentiment, and lexical semantics of words and emoji. We find signs of linguistic divergence across all these aspects, especially in topics and themes of conversation, in line with previous research. While US American English remains largely intelligible within its large speech community, our findings point at areas where miscommunication may eventually arise given ongoing polarization and therefore potential linguistic divergence. Our flexible methodology — combining data mining, lexicostatistics, machine learning, large language models and a systematic human annotation approach — is largely language and platform agnostic. In other words, while we focus here on US political divides and US English, the same approach is applicable to other countries, languages, and social media platforms
Recommended from our members
Topical advection as a baseline model for corpus-based lexical dynamics
An important question in the field of corpus-based evolutionary language dynamics research is concerned with distinguishing selection-driven linguistic change from neutral evolution, and from changes stemming from language-external factors (cultural drift). A commonly used proxy for the popularity or selective fitness of an element is its corpus frequency. However, a number of recent works have pointed out that raw frequencies can often be misleading. We propose a model for controlling for drift in contextual topics in corpora - the topical-cultural advection model - and demonstrate that this simple measure is capable of accounting for a considerable amount of variability in word frequency changes in a corpus spanning two centuries of language use
Reliable Detection and Quantification of Selective Forces in Language Change
Language change is a cultural evolutionary process in which variants of
linguistic variables change in frequency through processes analogous to
mutation, selection and genetic drift. In this work, we apply a
recently-introduced method to corpus data to quantify the strength of selection
in specific instances of historical language change. We first demonstrate, in
the context of English irregular verbs, that this method is more reliable and
interpretable than similar methods that have previously been applied. We
further extend this study to demonstrate that a bias towards phonological
simplicity overrides that favouring grammatical simplicity when these are in
conflict. Finally, with reference to Spanish spelling reforms, we show that the
method can also detect points in time at which selection strengths change, a
feature that is generically expected for socially-motivated language change.
Together, these results indicate how hypotheses for mechanisms of language
change can be tested quantitatively using historical corpus data
Conceptual similarity and communicative need shape colexification:An experimental study
Colexification refers to the phenomenon of multiple meanings sharing one word
in a language. Cross-linguistic lexification patterns have been shown to be
largely predictable, as similar concepts are often colexified. We test a recent
claim that, beyond this general tendency, communicative needs play an important
role in shaping colexification patterns. We approach this question by means of
a series of human experiments, using an artificial language communication game
paradigm. Our results across four experiments match the previous
cross-linguistic findings: all other things being equal, speakers do prefer to
colexify similar concepts. However, we also find evidence supporting the
communicative need hypothesis: when faced with a frequent need to distinguish
similar pairs of meanings, speakers adjust their colexification preferences to
maintain communicative efficiency, and avoid colexifying those similar meanings
which need to be distinguished in communication. This research provides further
evidence to support the argument that languages are shaped by the needs and
preferences of their speakers
How individuals change language
Languages emerge and change over time at the population level though interactions between individual speakers. It is, however, hard to directly observe how a single speaker's linguistic innovation precipitates a population-wide change in the language, and many theoretical proposals exist. We introduce a very general mathematical model that encompasses a wide variety of individual-level linguistic behaviours and provides statistical predictions for the population-level changes that result from them. This model allows us to compare the likelihood of empirically-attested changes in definite and indefinite articles in multiple languages under different assumptions on the way in which individuals learn and use language. We find that accounts of language change that appeal primarily to errors in childhood language acquisition are very weakly supported by the historical data, whereas those that allow speakers to change incrementally across the lifespan are more plausible, particularly when combined with social network effects
Recommended from our members
Compression ensembles quantify aesthetic complexity and the evolution of visual art
Acknowledgements: We would like to thank Dr. Mikhail Tamm for helpful discussions. Thumbnail previews of artworks depicted for informative purposes as fair use.Funder: Royal SocietyAbstractTo the human eye, different images appear more or less complex, but capturing this intuition in a single aesthetic measure is considered hard. Here, we propose a computationally simple, transparent method for modeling aesthetic complexity as a multidimensional algorithmic phenomenon, which enables the systematic analysis of large image datasets. The approach captures visual family resemblance via a multitude of image transformations and subsequent compressions, yielding explainable embeddings. It aligns well with human judgments of visual complexity, and performs well in authorship and style recognition tasks. Showcasing the functionality, we apply the method to 125,000 artworks, recovering trends and revealing new insights regarding historical art, artistic careers over centuries, and emerging aesthetics in a contemporary NFT art market. Our approach, here applied to images but applicable more broadly, provides a new perspective to quantitative aesthetics, connoisseurship, multidimensional meaning spaces, and the study of cultural complexity.</jats:p