Search CORE

51,954 research outputs found

Network properties of written human language

Author: A. P. Masucci
G. J. Rodgers
G. K. Zipf
G. Orwell
H. A. Simon
S. N. Dorogovtsev
W. Li
Publication venue: 'American Physical Society (APS)'
Publication date: 08/05/2006
Field of study

We investigate the nature of written human language within the framework of complex network theory. In particular, we analyse the topology of Orwell's \textit{1984} focusing on the local properties of the network, such as the properties of the nearest neighbors and the clustering coefficient. We find a composite power law behavior for both the average nearest neighbor's degree and average clustering coefficient as a function of the vertex degree. This implies the existence of different functional classes of vertices. Furthermore we find that the second order vertex correlations are an essential component of the network architecture. To model our empirical results we extend a previously introduced model for language due to Dorogovtsev and Mendes. We propose an accelerated growing network model that contains three growth mechanisms: linear preferential attachment, local preferential attachment and the random growth of a pre-determined small finite subset of initial vertices. We find that with these elementary stochastic rules we are able to produce a network showing syntactic-like structures

arXiv.org e-Print Archive

Brunel University Research Archive

Reconsidering the significance of genomic word frequency

Author: Csűrös Miklós
Kucherov Gregory
Noé Laurent
Publication venue
Publication date: 14/09/2006
Field of study

We propose that the distribution of DNA words in genomic sequences can be primarily characterized by a double Pareto-lognormal distribution, which explains lognormal and power-law features found across all known genomes. Such a distribution may be the result of completely random sequence evolution by duplication processes. The parametrization of genomic word frequencies allows for an assessment of significance for frequent or rare sequence motifs

arXiv.org e-Print Archive

CiteSeerX

HAL - Lille 3

INRIA a CCSD electronic archive server

The dynamics of correlated novelties

Author: Loreto V.
Servedio V. D. P.
Strogatz S. H.
Tria F.
Publication venue
Publication date: 07/10/2013
Field of study

One new thing often leads to another. Such correlated novelties are a familiar part of daily life. They are also thought to be fundamental to the evolution of biological systems, human society, and technology. By opening new possibilities, one novelty can pave the way for others in a process that Kauffman has called "expanding the adjacent possible". The dynamics of correlated novelties, however, have yet to be quantified empirically or modeled mathematically. Here we propose a simple mathematical model that mimics the process of exploring a physical, biological or conceptual space that enlarges whenever a novelty occurs. The model, a generalization of Polya's urn, predicts statistical laws for the rate at which novelties happen (analogous to Heaps' law) and for the probability distribution on the space explored (analogous to Zipf's law), as well as signatures of the hypothesized process by which one novelty sets the stage for another. We test these predictions on four data sets of human activity: the edit events of Wikipedia pages, the emergence of tags in annotation systems, the sequence of words in texts, and listening to new songs in online music catalogues. By quantifying the dynamics of correlated novelties, our results provide a starting point for a deeper understanding of the ever-expanding adjacent possible and its role in biological, linguistic, cultural, and technological evolution

arXiv.org e-Print Archive

Archivio della ricerca- Università di Roma La Sapienza

Co-word maps of biotechnology: an example of cognitive scientometrics

Author: Courtial J.-P.
Rip A.
Publication venue
Publication date: 01/01/1984
Field of study

To analyse developments of scientific fields, scientometrics provides useful tools, provided one is prepared to take the content of scientific articles into account. Such cognitive scientometrics is illustrated by using as data a ten-year period of articles from a biotechnology core journal. After coding with key-words, the relations between articles are brought out by co-word analysis. Maps of the field are given, showing connections between areas and their change over time, and with respect to the institutions in which research is performed. In addition, other approaches are explored, including an indicator of lsquotheoretical levelrsquo of bodies of articles

University of Twente Research Information

Rank-frequency relation for Chinese characters

Author: Allahverdyan A. E.
Deng W. B.
Li B.
Wang Q. A.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/01/2014
Field of study

We show that the Zipf's law for Chinese characters perfectly holds for sufficiently short texts (few thousand different characters). The scenario of its validity is similar to the Zipf's law for words in short English texts. For long Chinese texts (or for mixtures of short Chinese texts), rank-frequency relations for Chinese characters display a two-layer, hierarchic structure that combines a Zipfian power-law regime for frequent characters (first layer) with an exponential-like regime for less frequent characters (second layer). For these two layers we provide different (though related) theoretical descriptions that include the range of low-frequency characters (hapax legomena). The comparative analysis of rank-frequency relations for Chinese characters versus English words illustrates the extent to which the characters play for Chinese writers the same role as the words for those writing within alphabetical systems.Comment: To appear in European Physical Journal B (EPJ B), 2014 (22 pages, 7 figures

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)