Search CORE

1,648 research outputs found

The frequency spectrum of finite samples from the intermittent silence process

Author: Ferrer Cancho Ramon
Gavaldà Mestre Ricard
Publication venue: 'Wiley'
Publication date: 01/01/2009
Field of study

It has been argued that the actual distribution of word frequencies could be reproduced or explained by generating a random sequence of letters and spaces according to the so-called intermittent silence process. The same kind of process could reproduce or explain the counts of other kinds of units from a wide range of disciplines. Taking the linguistic metaphor, we focus on the frequency spectrum, i.e., the number of words with a certain frequency, and the vocabulary size, i.e., the number of different words of text generated by an intermittent silence process. We derive and explain how to calculate accurately and efficiently the expected frequency spectrum and the expected vocabulary size as a function of the text size.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Can simple models explain Zipf’s law for all exponents?

Author: Ferrer Cancho Ramon
Servedio Vito D. P.
Publication venue: RAM-Verlag
Publication date: 01/01/2005
Field of study

H. Simon proposed a simple stochastic process for explaining Zipf’s law for word frequencies. Here we introduce two similar generalizations of Simon’s model that cover the same range of exponents as the standard Simon model. The mathematical approach followed minimizes the amount of mathematical background needed for deriving the exponent, compared to previous approaches to the standard Simon’s model. Reviewing what is known from other simple explanations of Zipf’s law, we conclude there is no single radically simple explanation covering the whole range of variation of the exponent of Zipf’s law in humans. The meaningfulness of Zipf’s law for word frequencies remains an open question.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Current challenges for preseismic electromagnetic emissions: shedding light from micro-scale plastic flow, granular packings, phase transitions and self-affinity notion of fracture process

Author: Eftaxias K.
Potirakis S. M.
Publication venue: 'Copernicus GmbH'
Publication date: 13/08/2013
Field of study

Are there credible electromagnetic (EM) EQ precursors? This a question debated in the scientific community and there may be legitimate reasons for the critical views. The negative view concerning the existence of EM precursors is enhanced by features that accompany their observation which are considered as paradox ones, namely, these signals: (i) are not observed at the time of EQs occurrence and during the aftershock period, (ii) are not accompanied by large precursory strain changes, (iii) are not accompanied by simultaneous geodetic or seismological precursors and (v) their traceability is considered problematic. In this work, the detected candidate EM precursors are studied through a shift in thinking towards the basic science findings relative to granular packings, micron-scale plastic flow, interface depinning, fracture size effects, concepts drawn from phase transitions, self-affine notion of fracture and faulting process, universal features of fracture surfaces, recent high quality laboratory studies, theoretical models and numerical simulations. Strict criteria are established for the definition of an emerged EM anomaly as a preseismic one, while, precursory EM features, which have been considered as paradoxes, are explained. A three-stage model for EQ generation by means of preseismic fracture-induced EM emissions is proposed. The claim that the observed EM precursors may permit a real-time and step-by-step monitoring of the EQ generation is tested

arXiv.org e-Print Archive

Directory of Open Access Journals

Compression and the origins of Zipf's law for word frequencies

Author: Ferrer-i-Cancho Ramon
Publication venue: 'Wiley'
Publication date: 01/01/2016
Field of study

Here we sketch a new derivation of Zipf's law for word frequencies based on optimal coding. The structure of the derivation is reminiscent of Mandelbrot's random typing model but it has multiple advantages over random typing: (1) it starts from realistic cognitive pressures (2) it does not require fine tuning of parameters and (3) it sheds light on the origins of other statistical laws of language and thus can lead to a compact theory of linguistic laws. Our findings suggest that the recurrence of Zipf's law in human languages could originate from pressure for easy and fast communication.Comment: arguments have been improved; in press in Complexity (Wiley

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

IMPROVED MULTIPLE BIRDSONG TRACKING WITH DISTRIBUTION DERIVATIVE METHOD AND MARKOV RENEWAL PROCESS CLUSTERING

Author: Bonada J
IEEE
Musevic S
Plumbley MD
Stowell D
Publication venue
Publication date: 01/01/2013
Field of study

DS & MP are supported by an EPSRC Leadership Fellowship EP/G007144/1

arXiv.org e-Print Archive

Crossref

University of Surrey

UPF Digital Repository

Queen Mary Research Online

Surrey Research Insight

Optimal coding and the origins of Zipfian laws

Author: Bentz Christian
Ferrer-i-Cancho Ramon
Seguin Caio
Publication venue: 'Informa UK Limited'
Publication date: 29/05/2020
Field of study

The problem of compression in standard information theory consists of assigning codes as short as possible to numbers. Here we consider the problem of optimal coding -- under an arbitrary coding scheme -- and show that it predicts Zipf's law of abbreviation, namely a tendency in natural languages for more frequent words to be shorter. We apply this result to investigate optimal coding also under so-called non-singular coding, a scheme where unique segmentation is not warranted but codes stand for a distinct number. Optimal non-singular coding predicts that the length of a word should grow approximately as the logarithm of its frequency rank, which is again consistent with Zipf's law of abbreviation. Optimal non-singular coding in combination with the maximum entropy principle also predicts Zipf's rank-frequency distribution. Furthermore, our findings on optimal non-singular coding challenge common beliefs about random typing. It turns out that random typing is in fact an optimal coding process, in stark contrast with the common assumption that it is detached from cost cutting considerations. Finally, we discuss the implications of optimal coding for the construction of a compact theory of Zipfian laws and other linguistic laws.Comment: in press in the Journal of Quantitative Linguistics; definition of concordant pair corrected, proofs polished, references update

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC

Information content versus word length in random typing

Author: Ferrer-i-Cancho Ramon
Martín Fermín Moscoso del Prado
Publication venue: 'IOP Publishing'
Publication date: 01/01/2011
Field of study

Recently, it has been claimed that a linear relationship between a measure of information content and word length is expected from word length optimization and it has been shown that this linearity is supported by a strong correlation between information content and word length in many languages (Piantadosi et al. 2011, PNAS 108, 3825-3826). Here, we study in detail some connections between this measure and standard information theory. The relationship between the measure and word length is studied for the popular random typing process where a text is constructed by pressing keys at random from a keyboard containing letters and a space behaving as a word delimiter. Although this random process does not optimize word lengths according to information content, it exhibits a linear relationship between information content and word length. The exact slope and intercept are presented for three major variants of the random typing process. A strong correlation between information content and word length can simply arise from the units making a word (e.g., letters) and not necessarily from the interplay between a word and its context as proposed by Piantadosi et al. In itself, the linear relation does not entail the results of any optimization process

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

UPCommons. Portal del coneixement obert de la UPC

HAL AMU

HAL Descartes

Hal-Diderot

Parallels of human language in the behavior of bottlenose dolphins

Author: Ferrer-i-Cancho R.
Lusseau D.
McCowan B.
Publication venue
Publication date: 05/05/2016
Field of study

A short review of similarities between dolphins and humans with the help of quantitative linguistics and information theory

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC

Online Research Database In Technology

Two Universality Properties Associated with the Monkey Model of Zipf's Law

Author: Perline Richard
Perline Ronald
Publication venue: 'MDPI AG'
Publication date: 30/11/2015
Field of study

The distribution of word probabilities in the monkey model of Zipf's law is associated with two universality properties: (1) the power law exponent converges strongly to

-1

as the alphabet size increases and the letter probabilities are specified as the spacings from a random division of the unit interval for any distribution with a bounded density function on

[0,1]

; and (2), on a logarithmic scale the version of the model with a finite word length cutoff and unequal letter probabilities is approximately normally distributed in the part of the distribution away from the tails. The first property is proved using a remarkably general limit theorem for the logarithm of sample spacings from Shao and Hahn, and the second property follows from Anscombe's central limit theorem for a random number of i.i.d. random variables. The finite word length model leads to a hybrid Zipf-lognormal mixture distribution closely related to work in other areas.Comment: 14 pages, 3 figure

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals