51,954 research outputs found
Network properties of written human language
We investigate the nature of written human language within the framework of complex network theory. In particular, we analyse the topology of Orwell's \textit{1984} focusing on the local properties of the network, such as the properties of the nearest neighbors and the clustering coefficient. We find a composite power law behavior for both the average nearest neighbor's degree and average clustering coefficient as a function of the vertex degree. This implies the existence of different functional classes of vertices. Furthermore we find that the second order vertex correlations are an essential component of the network architecture. To model our empirical results we extend a previously introduced model for language due to Dorogovtsev and Mendes. We propose an accelerated growing network model that contains three growth mechanisms: linear preferential attachment, local preferential attachment and the random growth of a pre-determined small finite subset of initial vertices. We find that with these elementary stochastic rules we are able to produce a network showing syntactic-like structures
Reconsidering the significance of genomic word frequency
We propose that the distribution of DNA words in genomic sequences can be
primarily characterized by a double Pareto-lognormal distribution, which
explains lognormal and power-law features found across all known genomes. Such
a distribution may be the result of completely random sequence evolution by
duplication processes. The parametrization of genomic word frequencies allows
for an assessment of significance for frequent or rare sequence motifs
The dynamics of correlated novelties
One new thing often leads to another. Such correlated novelties are a
familiar part of daily life. They are also thought to be fundamental to the
evolution of biological systems, human society, and technology. By opening new
possibilities, one novelty can pave the way for others in a process that
Kauffman has called "expanding the adjacent possible". The dynamics of
correlated novelties, however, have yet to be quantified empirically or modeled
mathematically. Here we propose a simple mathematical model that mimics the
process of exploring a physical, biological or conceptual space that enlarges
whenever a novelty occurs. The model, a generalization of Polya's urn, predicts
statistical laws for the rate at which novelties happen (analogous to Heaps'
law) and for the probability distribution on the space explored (analogous to
Zipf's law), as well as signatures of the hypothesized process by which one
novelty sets the stage for another. We test these predictions on four data sets
of human activity: the edit events of Wikipedia pages, the emergence of tags in
annotation systems, the sequence of words in texts, and listening to new songs
in online music catalogues. By quantifying the dynamics of correlated
novelties, our results provide a starting point for a deeper understanding of
the ever-expanding adjacent possible and its role in biological, linguistic,
cultural, and technological evolution
Co-word maps of biotechnology: an example of cognitive scientometrics
To analyse developments of scientific fields, scientometrics provides useful tools, provided one is prepared to take the content of scientific articles into account. Such cognitive scientometrics is illustrated by using as data a ten-year period of articles from a biotechnology core journal. After coding with key-words, the relations between articles are brought out by co-word analysis. Maps of the field are given, showing connections between areas and their change over time, and with respect to the institutions in which research is performed. In addition, other approaches are explored, including an indicator of lsquotheoretical levelrsquo of bodies of articles
Rank-frequency relation for Chinese characters
We show that the Zipf's law for Chinese characters perfectly holds for
sufficiently short texts (few thousand different characters). The scenario of
its validity is similar to the Zipf's law for words in short English texts. For
long Chinese texts (or for mixtures of short Chinese texts), rank-frequency
relations for Chinese characters display a two-layer, hierarchic structure that
combines a Zipfian power-law regime for frequent characters (first layer) with
an exponential-like regime for less frequent characters (second layer). For
these two layers we provide different (though related) theoretical descriptions
that include the range of low-frequency characters (hapax legomena). The
comparative analysis of rank-frequency relations for Chinese characters versus
English words illustrates the extent to which the characters play for Chinese
writers the same role as the words for those writing within alphabetical
systems.Comment: To appear in European Physical Journal B (EPJ B), 2014 (22 pages, 7
figures
- …