Search CORE

101 research outputs found

Emergence of linguistic laws in human voice

Author: A Clauset
A Corral
A Corral
A Romberg
AS Park
B Corominas-Murtra
B Mandelbrot
B McCowan
C Kello
C Kello
C Langton
DJ Schwab
F Font-Clos
F Font-Clos
F Font-Clos
F Lamel
H Brumm
I Moreno-Sanchez
J Baixeries
J Gillooly
J Glass
J Luque
J Saffran
J Saffran
L Doyle
L Egghe
L Emberson
L Ha
L Lü
M Aylett
M Bunge
M Gerlach
M Gustison
M Nowak
M Tyler
M van Egmond
N Chater
N Evans
O Peters
P Kuhl
P Kuhl
P Kuhl
R Ferrer i Cancho
R Ferrer i Cancho
R Ferrer i Cancho
R Ferrer i Cancho
R Ferrer i Cancho
R Ferrer i Cancho
R Ferrer i Cancho
R Ferrer-i Cancho
S Greenberg
S Piantadosi
ST Piantadosi
T Crystal
T Drugman
T Fitch
T Nabeshima
W Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/10/2016
Field of study

Submitted for publicationSubmitted for publicatio

arXiv.org e-Print Archive

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

PubMed Central

Queen Mary Research Online

On the origin of ambiguity in efficient communication

Author: B Corominas-Murtra
Bernat Corominas-Murtra
C Bennett
C Bennett
C Bennett
E Wang
G Zipf
HR Lewis
J Ladyman
Jordi Fortuny
N Chomsky
P Harremoës
R Ferrer-i-Cancho
R Landauer
RB Ash
T Toffoli
TM Cover
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2013
Field of study

This article studies the emergence of ambiguity in communication through the concept of logical irreversibility and within the framework of Shannon's information theory. This leads us to a precise and general expression of the intuition behind Zipf's vocabulary balance in terms of a symmetry equation between the complexities of the coding and the decoding processes that imposes an unavoidable amount of logical uncertainty in natural communication. Accordingly, the emergence of irreversible computations is required if the complexities of the coding and the decoding processes are balanced in a symmetric scenario, which means that the emergence of ambiguous codes is a necessary condition for natural communication to succeed.Comment: 28 pages, 2 figure

arXiv.org e-Print Archive

Crossref

Testing the robustness of laws of polysemy and brevity versus frequency

Author: A Corral
A Kilgarriff
B MacWhinney
C Fellbaum
EG Altmann
F Font-Clos
G Fenk-Oczlon
GK Zipf
GK Zipf
GK Zipf
J Baixeries
J Ke
M Razavi
N Ide
R Ferrer-i-Cancho
R Newson
RH Baayen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

The pioneering research of G.K. Zipf on the relationship between word frequency and other word features led to the formulation of various linguistic laws. Here we focus on a couple of them: the meaning-frequency law, i.e. the tendency of more frequent words to be more polysemous, and the law of abbreviation, i.e. the tendency of more frequent words to be shorter. Here we evaluate the robustness of these laws in contexts where they have not been explored yet to our knowledge. The recovery of the laws again in new conditions provides support for the hypothesis that they originate from abstract mechanisms.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Folksonomies and clustering in the collaborative system CiteULike

Author: Andrea Capocci
Berendt B
Caldarelli G
Cattuto C Loreto V
Ferrer i Cancho R
Guido Caldarelli
Heyman P Garcia-Molina H
Hotho A
Lambiotte R
Santos-Neto E Ripeanu M Iamnitchi A
Schmitz C Grahl M Hotho A Stumme G Cattuto C Baldassarri A Loreto V Servedio V D P
Simon H A
Zipf G K
Publication venue: 'IOP Publishing'
Publication date: 16/10/2007
Field of study

We analyze CiteULike, an online collaborative tagging system where users bookmark and annotate scientific papers. Such a system can be naturally represented as a tripartite graph whose nodes represent papers, users and tags connected by individual tag assignments. The semantics of tags is studied here, in order to uncover the hidden relationships between tags. We find that the clustering coefficient reflects the semantical patterns among tags, providing useful ideas for the designing of more efficient methods of data classification and spam detection.Comment: 9 pages, 5 figures, iop style; corrected typo

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Archivio della ricerca della Scuola IMT Alti Studi Lucca

IMT Institutional Repository

Point-occurrence self-similarity in crackling-noise systems and in other complex systems

Author: Abramowitz M
Bak P
Bak P
Beggs J M
Binney J J
Chapman C R
Christensen K
Corral A Ferrer i Cancho R Díaz-Guilera A
Daley D J
Gardner J K
Goh K-I
Jensen H J
Kadanoff L P
Kanamori H
Laurson L
Lennartz S
Lowen S B
Malamud B D
Molchan G
Mulargia F
Osorio I Frei M G Sornette D Milton J Lai Y-C
Ossó A Corral A Llebot J E
Sornette D
Zipf G K
Álvaro Corral
Publication venue: 'IOP Publishing'
Publication date: 28/09/2008
Field of study

It has been recently found that a number of systems displaying crackling noise also show a remarkable behavior regarding the temporal occurrence of successive events versus their size: a scaling law for the probability distributions of waiting times as a function of a minimum size is fulfilled, signaling the existence on those systems of self-similarity in time-size. This property is also present in some non-crackling systems. Here, the uncommon character of the scaling law is illustrated with simple marked renewal processes, built by definition with no correlations. Whereas processes with a finite mean waiting time do not fulfill a scaling law in general and tend towards a Poisson process in the limit of very high sizes, processes without a finite mean tend to another class of distributions, characterized by double power-law waiting-time densities. This is somehow reminiscent of the generalized central limit theorem. A model with short-range correlations is not able to escape from the attraction of those limit distributions. A discussion on open problems in the modeling of these properties is provided.Comment: Submitted to J. Stat. Mech. for the proceedings of UPON 2008 (Lyon), topic: crackling nois

arXiv.org e-Print Archive

Crossref

RECERCAT

Coherent oscillations in word-use data from 1700 to 2008

Author: A Clauset
A-L Barabási
AM Petersen
AM Petersen
C Castellano
C Darwin
D Watts
E Alvarez-Lacalle
E Lieberman
EA Pechenick
EG Altmann
G Cocho
GK Zipf
HS Heaps
J Gao
J-B Michel
JM Hughes
JM Twenge
M Gerlach
M Pagel
M Pagel
M Sigman
MA Montemurro
MA Montemurro
MA Montemurro
MA Nowak
ME Newman
MS Morgan
PM Greenfield
R Ferrer-i-Cancho
R Ferrer-i-Cancho
RD Gray
V Bochkarev
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

In written language, the choice of specific words is constrained by both grammatical requirements and the specific semantic context of the message to be transmitted. To a significant degree, the semantic context is in turn affected by a broad cultural and historical environment, which also influences matters of style and manners. Over time, those environmental factors leave an imprint in the statistics of language use, with some words becoming more common and other words being preferred less. Here we characterize the patterns of language use over time based on word statistics extracted from more than 4.5 million books written over a period of 308 years. We find evidence of novel systematic oscillatory patterns in word use with a consistent period narrowly distributed around 14 years. The specific phase relationships between different words show structure at two independent levels: first, there is a weak global phase modulation that is primarily linked to overall shifts in the vocabulary across time; and second, a stronger component dependent on well defined semantic relationships between words. In particular, complex network analysis reveals that semantically related words show strong phase coherence. Ultimately, these previously unknown patterns in the statistics of language may be a consequence of changes in the cultural framework that influences the thematic focus of writers

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

CONICET Digital

Open Research Online (The Open University)

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

The University of Manchester - Institutional Repository

Zipf's Law in Short-Time Timbral Codings of Speech, Music, and Environmental Sound Signals

Author: A Clauset
A Corral
A Oceák
A Saichev
AL Barabasi
AS Bregman
B Corominas-Murtra
B Liu
B Manaris
BCJ Moore
BCJ Moore
BD Malamud
BS Wilson
C Cattuto
CD Manning
D Sornette
DH Zanette
DH Zanette
E Bigand
E Zwicker
E Zwicker
EM Kramer
EW Montroll
GJ Peterson
GK Zipf
H Fletcher
HA Simon
I Eliazar
J Haitsma
J Plazak
JJ Aucouturier
Joan Serrà
JP Sethna
KJ Hsü
KJ Hsü
LA Adamic
M Beltrán del Río
M Bethge
M Mitzenmacher
M Müller
MA Casey
Martín Haro
MD Hauser
MEJ Newman
MF Assaneo
N Chater
P Bak
Perfecto Herrera
R Baeza-Yates
R Ferrer i Cancho
R Ferrer i Cancho
RE Berg
RF Voss
S Harding
SS Stevens
TF Quatieri
V Madisetti
Yamir Moreno
Álvaro Corral
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

Timbre is a key perceptual feature that allows discrimination between different sounds. Timbral sensations are highly dependent on the temporal evolution of the power spectrum of an audio signal. In order to quantitatively characterize such sensations, the shape of the power spectrum has to be encoded in a way that preserves certain physical and perceptual properties. Therefore, it is common practice to encode short-time power spectra using psychoacoustical frequency scales. In this paper, we study and characterize the statistical properties of such encodings, here called timbral code-words. In particular, we report on rank-frequency distributions of timbral code-words extracted from 740 hours of audio coming from disparate sources such as speech, music, and environmental sounds. Analogously to text corpora, we find a heavy-tailed Zipfian distribution with exponent close to one. Importantly, this distribution is found independently of different encoding decisions and regardless of the audio source. Further analysis on the intrinsic characteristics of most and least frequent code-words reveals that the most frequent code-words tend to have a more homogeneous structure. We also find that speech and music databases have specific, distinctive code-words while, in the case of the environmental sounds, this database-specific code-words are not present. Finally, we find that a Yule-Simon process with memory provides a reasonable quantitative approximation for our data, suggesting the existence of a common simple generative mechanism for all considered sound sources

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Digital.CSIC

UPF Digital Repository

Prescriptions of Traditional Chinese Medicine Are Specific to Cancer Types and Adjustable to Temperature Changes

Author: A Jemal
B Patwardhan
C Furusawa
CC Liu
CL Sawyers
D Hanahan
DE Gerber
ES Harris
FM Wu
GG Zhang
Hsin-Ying Hsieh
HY Hsieh
J Cuzick
JJ Sung
L Adamic
LQ Ha
MC Yu
N Wiseman
Pei-Hsun Chiu
PU Unschuld
R Ferrer-i-Cancho
Rakesh K. Srivastava
RL Axtell
RW Benz
S Aggarwal
S-C Wang
S-Z Yang
SC Hsieh
Shizhen Li
Sun-Chong Wang
VA Miller
W Li
Z Zhang
Publication venue: Public Library of Science
Publication date: 16/02/2012
Field of study

Targeted cancer therapies, with specific molecular targets, ameliorate the side effect issue of radiation and chemotherapy and also point to the development of personalized medicine. Combination of drugs targeting multiple pathways of carcinogenesis is potentially more fruitful. Traditional Chinese medicine (TCM) has been tailoring herbal mixtures for individualized healthcare for two thousand years. A systematic study of the patterns of TCM formulas and herbs prescribed to cancers is valuable. We analysed a total of 187,230 TCM prescriptions to 30 types of cancer in Taiwan in 2007, a year's worth of collection from the National Health Insurance reimbursement database (Taiwan). We found that a TCM cancer prescription consists on average of two formulas and four herbs. We show that the percentage weights of TCM formulas and herbs in a TCM prescription follow Zipf's law with an exponent around 0.6. TCM prescriptions to benign neoplasms have a larger Zipf's exponent than those to malignant cancers. Furthermore, we show that TCM prescriptions, via weighted combination of formulas and herbs, are specific to not only the malignancy of neoplasms but also the sites of origins of malignant cancers. From the effects of formulas and natures of herbs that were heavily prescribed to cancers, that cancers are a ‘warm and stagnant’ syndrome in TCM can be proposed, suggesting anti-inflammatory regimens for better prevention and treatment of cancers. We show that TCM incorporated relevant formulas to the prescriptions to cancer patients with a secondary morbidity. We compared TCM prescriptions made in different seasons and identified temperatures as the environmental factor that correlates with changes in TCM prescriptions in Taiwan. Lung cancer patients were among the patients whose prescriptions were adjusted when temperatures drop. The findings of our study provide insight to TCM cancer treatment, helping dialogue between modern western medicine and TCM for better cancer care

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

Speech Graphs Provide a Quantitative Measure of Thought Disorder in Psychosis

Author: A Ma'ayan
AJ Holanda
AM Turing
Ana C. Pieretti
B Bollobás
CM Bishop
CT Butts
DM Blei
E Bullmore
F Moretti
GH John
Guillermo A. Cecchi
HI Kaplan
I Skre
J Cohen
JBW Williams
K Börner
M Bales
M Hall
M Sigman
M Singh
Mauro Copelli
ME Costa
MH First
Natalia B. Mota
Nathalia Lemos
Nivaldo A. P. Vasconcelos
Osame Kinouchi
P Bech
R Ferrer i Cancho
RE Hoffman
Ricard V. Solé
RO Duda
RR Grinker
SB Kotsiantis
Sidarta Ribeiro
SR Kay
T Fawcett
TR Insel
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

Background: Psychosis has various causes, including mania and schizophrenia. Since the differential diagnosis of psychosis is exclusively based on subjective assessments of oral interviews with patients, an objective quantification of the speech disturbances that characterize mania and schizophrenia is in order. In principle, such quantification could be achieved by the analysis of speech graphs. A graph represents a network with nodes connected by edges; in speech graphs, nodes correspond to words and edges correspond to semantic and grammatical relationships. Methodology/Principal Findings: To quantify speech differences related to psychosis, interviews with schizophrenics, manics and normal subjects were recorded and represented as graphs. Manics scored significantly higher than schizophrenics in ten graph measures. Psychopathological symptoms such as logorrhea, poor speech, and flight of thoughts were grasped by the analysis even when verbosity differences were discounted. Binary classifiers based on speech graph measures sorted schizophrenics from manics with up to 93.8% of sensitivity and 93.7% of specificity. In contrast, sorting based on the scores of two standard psychiatric scales (BPRS and PANSS) reached only 62.5% of sensitivity and specificity. Conclusions/Significance: The results demonstrate that alterations of the thought process manifested in the speech of psychotic patients can be objectively measured using graph-theoretical tools, developed to capture specific features of the normal and dysfunctional flow of thought, such as divergence and recurrence. The quantitative analysis of speech graphs is not redundant with standard psychometric scales but rather complementary, as it yields a very accurate sorting of schizophrenics and manics. Overall, the results point to automated psychiatric diagnosis based not on what is said, but on how it is said.FINEP [01.06.1092.00]FINEPCNPq Universal [481506/2007-1]CNPq UniversalCNPqCNPqCapesCAPESad Associacao Alberto Santos Dumont para Apoio a Pesquisa (AASDAP)a'd Associacao Alberto Santos Dumont para Apoio a Pesquisa (AASDAP

Public Library of Science (PLOS)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Directory of Open Access Journals

PubMed Central

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Universidade de São Paulo

FigShare