1,248 research outputs found
Beyond the Zipf-Mandelbrot law in quantitative linguistics
In this paper the Zipf-Mandelbrot law is revisited in the context of
linguistics. Despite its widespread popularity the Zipf--Mandelbrot law can
only describe the statistical behaviour of a rather restricted fraction of the
total number of words contained in some given corpus. In particular, we focus
our attention on the important deviations that become statistically relevant as
larger corpora are considered and that ultimately could be understood as
salient features of the underlying complex process of language generation.
Finally, it is shown that all the different observed regimes can be accurately
encompassed within a single mathematical framework recently introduced by C.
Tsallis.Comment: 6 pages and 7 figures; minor changes in text, added referece
Testing the robustness of laws of polysemy and brevity versus frequency
The pioneering research of G.K. Zipf on the relationship between word frequency and other word features led to the formulation of various linguistic laws. Here we focus on a couple of them: the meaning-frequency law, i.e. the tendency of more frequent words to be more polysemous, and the law of abbreviation, i.e. the tendency of more frequent words to be shorter. Here we evaluate the robustness of these laws in contexts where they have not been explored yet to our knowledge. The recovery of the laws again in new conditions provides support for the hypothesis that they originate from abstract mechanisms.Peer ReviewedPostprint (author's final draft
Investigating people: a qualitative analysis of the search behaviours of open-source intelligence analysts
The Internet and the World Wide Web have become integral parts of the lives of many modern individuals, enabling almost instantaneous communication, sharing and broadcasting of thoughts, feelings and opinions. Much of this information is publicly facing, and as such, it can be utilised in a multitude of online investigations, ranging from employee vetting and credit checking to counter-terrorism and fraud prevention/detection. However, the search needs and behaviours of these investigators are not well documented in the literature. In order to address this gap, an in-depth qualitative study was carried out in cooperation with a leading investigation company. The research contribution is an initial identification of Open-Source Intelligence investigator search behaviours, the procedures and practices that they undertake, along with an overview of the difficulties and challenges that they encounter as part of their domain. This lays the foundation for future research in to the varied domain of Open-Source Intelligence gathering
Zipf's Law in Gene Expression
Using data from gene expression databases on various organisms and tissues,
including yeast, nematodes, human normal and cancer tissues, and embryonic stem
cells, we found that the abundances of expressed genes exhibit a power-law
distribution with an exponent close to -1, i.e., they obey Zipf's law.
Furthermore, by simulations of a simple model with an intra-cellular reaction
network, we found that Zipf's law of chemical abundance is a universal feature
of cells where such a network optimizes the efficiency and faithfulness of
self-reproduction. These findings provide novel insights into the nature of the
organization of reaction dynamics in living cells.Comment: revtex, 11 pages, 3 figures, submitted to Phys. Rev. Let
Network properties of written human language
We investigate the nature of written human language within the framework of complex network theory. In particular, we analyse the topology of Orwell's \textit{1984} focusing on the local properties of the network, such as the properties of the nearest neighbors and the clustering coefficient. We find a composite power law behavior for both the average nearest neighbor's degree and average clustering coefficient as a function of the vertex degree. This implies the existence of different functional classes of vertices. Furthermore we find that the second order vertex correlations are an essential component of the network architecture. To model our empirical results we extend a previously introduced model for language due to Dorogovtsev and Mendes. We propose an accelerated growing network model that contains three growth mechanisms: linear preferential attachment, local preferential attachment and the random growth of a pre-determined small finite subset of initial vertices. We find that with these elementary stochastic rules we are able to produce a network showing syntactic-like structures
Universal scaling in sports ranking
Ranking is a ubiquitous phenomenon in the human society. By clicking the web
pages of Forbes, you may find all kinds of rankings, such as world's most
powerful people, world's richest people, top-paid tennis stars, and so on and
so forth. Herewith, we study a specific kind, sports ranking systems in which
players' scores and prize money are calculated based on their performances in
attending various tournaments. A typical example is tennis. It is found that
the distributions of both scores and prize money follow universal power laws,
with exponents nearly identical for most sports fields. In order to understand
the origin of this universal scaling we focus on the tennis ranking systems. By
checking the data we find that, for any pair of players, the probability that
the higher-ranked player will top the lower-ranked opponent is proportional to
the rank difference between the pair. Such a dependence can be well fitted to a
sigmoidal function. By using this feature, we propose a simple toy model which
can simulate the competition of players in different tournaments. The
simulations yield results consistent with the empirical findings. Extensive
studies indicate the model is robust with respect to the modifications of the
minor parts.Comment: 8 pages, 7 figure
Bidding process in online auctions and winning strategy:rate equation approach
Online auctions have expanded rapidly over the last decade and have become a
fascinating new type of business or commercial transaction in this digital era.
Here we introduce a master equation for the bidding process that takes place in
online auctions. We find that the number of distinct bidders who bid times,
called the -frequent bidder, up to the -th bidding progresses as
. The successfully transmitted bidding rate by the
-frequent bidder is obtained as , independent of
for large . This theoretical prediction is in agreement with empirical data.
These results imply that bidding at the last moment is a rational and effective
strategy to win in an eBay auction.Comment: 4 pages, 6 figure
Complex network analysis of literary and scientific texts
We present results from our quantitative study of statistical and network
properties of literary and scientific texts written in two languages: English
and Polish. We show that Polish texts are described by the Zipf law with the
scaling exponent smaller than the one for the English language. We also show
that the scientific texts are typically characterized by the rank-frequency
plots with relatively short range of power-law behavior as compared to the
literary texts. We then transform the texts into their word-adjacency network
representations and find another difference between the languages. For the
majority of the literary texts in both languages, the corresponding networks
revealed the scale-free structure, while this was not always the case for the
scientific texts. However, all the network representations of texts were
hierarchical. We do not observe any qualitative and quantitative difference
between the languages. However, if we look at other network statistics like the
clustering coefficient and the average shortest path length, the English texts
occur to possess more clustered structure than do the Polish ones. This result
was attributed to differences in grammar of both languages, which was also
indicated in the Zipf plots. All the texts, however, show network structure
that differs from any of the Watts-Strogatz, the Barabasi-Albert, and the
Erdos-Renyi architectures
Highlighting Current Trends in Volunteered Geographic Information
Volunteered Geographic Information (VGI) is a growing area of research. This Special Issue aims to capture the main trends in VGI research based on 16 original papers, and distinguishes between two main areas, i.e., those that deal with the characteristics of VGI and those focused on applications of VGI. The topic of quality assessment and assurance dominates the papers on VGI characteristics, whereas application-oriented work covers three main domains: human behavioral analysis, natural disasters, and land cover/land use mapping. In this Special Issue, therefore, both the challenges and the potentials of VGI are addressed
- …