Search CORE

3,998 research outputs found

Population size predicts lexical diversity, but so does the mean sea level - why it is important to correctly account for the structure of temporal data

Author: Koplenig Alexander
Mueller-Spitzer Carolin
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2016
Field of study

In order to demonstrate why it is important to correctly account for the (serial dependent) structure of temporal data, we document an apparently spectacular relationship between population size and lexical diversity: for five out of seven investigated languages, there is a strong relationship between population size and lexical diversity of the primary language in this country. We show that this relationship is the result of a misspecified model that does not consider the temporal aspect of the data by presenting a similar but nonsensical relationship between the global annual mean sea level and lexical diversity. Given the fact that in the recent past, several studies were published that present surprising links between different economic, cultural, political and (socio-)demographical variables on the one hand and cultural or linguistic characteristics on the other hand, but seem to suffer from exactly this problem, we explain the cause of the misspecification and show that it has profound consequences. We demonstrate how simple transformation of the time series can often solve problems of this type and argue that the evaluation of the plausibility of a relationship is important in this context. We hope that our paper will help both researchers and reviewers to understand why it is important to use special models for the analysis of data with a natural temporal ordering

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

Publikationsserver des Instituts für Deutsche Sprache

FigShare

Neutral evolution and turnover over centuries of English word popularity

Author: Acerbi Alberto
Bentley R. Alexander
Garnett Philip
Hruschka Daniel J.
Ruck Damian
Publication venue
Publication date: 01/01/2017
Field of study

Here we test Neutral models against the evolution of English word frequency and vocabulary at the population scale, as recorded in annual word frequencies from three centuries of English language books. Against these data, we test both static and dynamic predictions of two neutral models, including the relation between corpus size and vocabulary size, frequency distributions, and turnover within those frequency distributions. Although a commonly used Neutral model fails to replicate all these emergent properties at once, we find that modified two-stage Neutral model does replicate the static and dynamic properties of the corpus data. This two-stage model is meant to represent a relatively small corpus (population) of English books, analogous to a `canon', sampled by an exponentially increasing corpus of books in the wider population of authors. More broadly, this mode -- a smaller neutral model within a larger neutral model -- could represent more broadly those situations where mass attention is focused on a small subset of the cultural variants.Comment: 12 pages, 5 figures, 1 tabl

arXiv.org e-Print Archive

Repository TU/e

Pure OAI Repository

Languages cool as they expand: Allometric scaling and the decreasing need for new words

Author: A Clauset
A Gnedin
A Vespignani
AA Tsonis
AL Barabási
AM Petersen
B Mandelbrot
B Podobnik
B Podobnik
D Fu
D Helbing
D Lazer
DC van Leijenhorst
E Alvarez-Lacalle
EA Altmann
EG Altmann
EG Altmann
GB West
GW Oehlert
HA Makse
HAJrJSA Makse
HD Rozenfeld
HD Rozenfeld
J Gao
J-B Michel
JA Evans
L Lü
LAN Amaral
LAN Amaral
LMA Bettencourt
M Batty
M Kleiber
M Markosova
M Riccaboni
M Sigman
M Steyvers
MA Montemurro
MEJ Newman
MHR Stanley
MÁ Serrano
R Ferrer i Cancho
R Ferrer i Cancho
R Ferrer i Cancho
RN Mantegna
S Bernhardsson
S Bernhardsson
S Karlin
SK Baek
X Gabaix
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/12/2012
Field of study

We analyze the occurrence frequencies of over 15 million words recorded in millions of books published during the past two centuries in seven different languages. For all languages and chronological subsets of the data we confirm that two scaling regimes characterize the word frequency distributions, with only the more common words obeying the classic Zipf law. Using corpora of unprecedented size, we test the allometric scaling relation between the corpus size and the vocabulary size of growing languages to demonstrate a decreasing marginal need for new words, a feature that is likely related to the underlying correlations between words. We calculate the annual growth fluctuations of word use which has a decreasing trend as the corpus size increases, indicating a slowdown in linguistic evolution following language expansion. This ‘‘cooling pattern’’ forms the basis of a third statistical regularity, which unlike the Zipf and the Heaps law, is dynamical in nature

arXiv.org e-Print Archive

Crossref

Boston University Institutional Repository (OpenBU)

Digital library of University of Maribor

PubMed Central

IMT Institutional Repository

Statistical Laws Governing Fluctuations in Word Use from Word Birth to Word Death

Author: A Baronchelli
A Puglisi
AA Tsonis
AL Barabási
AM Petersen
B Podobnik
D Canning
D Fu
D Rybski
E Alvarez-Lacalle
E Lieberman
EG Altmann
EG Altmann
J-B Michel
K Hu
LAN Amaral
LAN Amaral
M Pagel
M Riccaboni
M Sigman
MA Montemurro
MA Nowak
MHR Stanley
MÁ Serrano
P Klimek
R Crane
R Ferrer i Cancho
R Ferrer i Cancho
R Ferrer i Cancho
RV Solé
S Bernhardsson
S Picoli Jr
S Picoli Jr
SA Golder
ST Piantadosi
SV Buldyrev
V Loreto
V Plerou
Y Lee
Y Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

We analyze the dynamic properties of 10^7 words recorded in English, Spanish and Hebrew over the period 1800--2008 in order to gain insight into the coevolution of language and culture. We report language independent patterns useful as benchmarks for theoretical models of language evolution. A significantly decreasing (increasing) trend in the birth (death) rate of words indicates a recent shift in the selection laws governing word use. For new words, we observe a peak in the growth-rate fluctuations around 40 years after introduction, consistent with the typical entry time into standard dictionaries and the human generational timescale. Pronounced changes in the dynamics of language during periods of war shows that word correlations, occurring across time and between words, are largely influenced by coevolutionary social, technological, and political factors. We quantify cultural memory by analyzing the long-term correlations in the use of individual words using detrended fluctuation analysis.Comment: Version 1: 31 pages, 17 figures, 3 tables. Version 2 is streamlined, eliminates substantial material and incorporates referee comments: 19 pages, 14 figures, 3 table

arXiv.org e-Print Archive

Crossref

Boston University Institutional Repository (OpenBU)

PubMed Central

IMT Institutional Repository

Temporal dynamics of semantic relations in word embeddings: an application to predicting armed conflict participants

Author: Kutuzov Andrey
Velldal Erik
Øvrelid Lilja
Publication venue
Publication date: 01/01/2017
Field of study

This paper deals with using word embedding models to trace the temporal dynamics of semantic relations between pairs of words. The set-up is similar to the well-known analogies task, but expanded with a time dimension. To this end, we apply incremental updating of the models with new training texts, including incremental vocabulary expansion, coupled with learned transformation matrices that let us map between members of the relation. The proposed approach is evaluated on the task of predicting insurgent armed groups based on geographical locations. The gold standard data for the time span 1994--2010 is extracted from the UCDP Armed Conflicts dataset. The results show that the method is feasible and outperforms the baselines, but also that important work still remains to be done.Comment: to appear in EMNLP 2017 proceeding

arXiv.org e-Print Archive

Crossref

NORA - Norwegian Open Research Archives