1,248 research outputs found

    Beyond the Zipf-Mandelbrot law in quantitative linguistics

    Full text link
    In this paper the Zipf-Mandelbrot law is revisited in the context of linguistics. Despite its widespread popularity the Zipf--Mandelbrot law can only describe the statistical behaviour of a rather restricted fraction of the total number of words contained in some given corpus. In particular, we focus our attention on the important deviations that become statistically relevant as larger corpora are considered and that ultimately could be understood as salient features of the underlying complex process of language generation. Finally, it is shown that all the different observed regimes can be accurately encompassed within a single mathematical framework recently introduced by C. Tsallis.Comment: 6 pages and 7 figures; minor changes in text, added referece

    Testing the robustness of laws of polysemy and brevity versus frequency

    Get PDF
    The pioneering research of G.K. Zipf on the relationship between word frequency and other word features led to the formulation of various linguistic laws. Here we focus on a couple of them: the meaning-frequency law, i.e. the tendency of more frequent words to be more polysemous, and the law of abbreviation, i.e. the tendency of more frequent words to be shorter. Here we evaluate the robustness of these laws in contexts where they have not been explored yet to our knowledge. The recovery of the laws again in new conditions provides support for the hypothesis that they originate from abstract mechanisms.Peer ReviewedPostprint (author's final draft

    Investigating people: a qualitative analysis of the search behaviours of open-source intelligence analysts

    Get PDF
    The Internet and the World Wide Web have become integral parts of the lives of many modern individuals, enabling almost instantaneous communication, sharing and broadcasting of thoughts, feelings and opinions. Much of this information is publicly facing, and as such, it can be utilised in a multitude of online investigations, ranging from employee vetting and credit checking to counter-terrorism and fraud prevention/detection. However, the search needs and behaviours of these investigators are not well documented in the literature. In order to address this gap, an in-depth qualitative study was carried out in cooperation with a leading investigation company. The research contribution is an initial identification of Open-Source Intelligence investigator search behaviours, the procedures and practices that they undertake, along with an overview of the difficulties and challenges that they encounter as part of their domain. This lays the foundation for future research in to the varied domain of Open-Source Intelligence gathering

    Zipf's Law in Gene Expression

    Get PDF
    Using data from gene expression databases on various organisms and tissues, including yeast, nematodes, human normal and cancer tissues, and embryonic stem cells, we found that the abundances of expressed genes exhibit a power-law distribution with an exponent close to -1, i.e., they obey Zipf's law. Furthermore, by simulations of a simple model with an intra-cellular reaction network, we found that Zipf's law of chemical abundance is a universal feature of cells where such a network optimizes the efficiency and faithfulness of self-reproduction. These findings provide novel insights into the nature of the organization of reaction dynamics in living cells.Comment: revtex, 11 pages, 3 figures, submitted to Phys. Rev. Let

    Network properties of written human language

    Get PDF
    We investigate the nature of written human language within the framework of complex network theory. In particular, we analyse the topology of Orwell's \textit{1984} focusing on the local properties of the network, such as the properties of the nearest neighbors and the clustering coefficient. We find a composite power law behavior for both the average nearest neighbor's degree and average clustering coefficient as a function of the vertex degree. This implies the existence of different functional classes of vertices. Furthermore we find that the second order vertex correlations are an essential component of the network architecture. To model our empirical results we extend a previously introduced model for language due to Dorogovtsev and Mendes. We propose an accelerated growing network model that contains three growth mechanisms: linear preferential attachment, local preferential attachment and the random growth of a pre-determined small finite subset of initial vertices. We find that with these elementary stochastic rules we are able to produce a network showing syntactic-like structures

    Universal scaling in sports ranking

    Full text link
    Ranking is a ubiquitous phenomenon in the human society. By clicking the web pages of Forbes, you may find all kinds of rankings, such as world's most powerful people, world's richest people, top-paid tennis stars, and so on and so forth. Herewith, we study a specific kind, sports ranking systems in which players' scores and prize money are calculated based on their performances in attending various tournaments. A typical example is tennis. It is found that the distributions of both scores and prize money follow universal power laws, with exponents nearly identical for most sports fields. In order to understand the origin of this universal scaling we focus on the tennis ranking systems. By checking the data we find that, for any pair of players, the probability that the higher-ranked player will top the lower-ranked opponent is proportional to the rank difference between the pair. Such a dependence can be well fitted to a sigmoidal function. By using this feature, we propose a simple toy model which can simulate the competition of players in different tournaments. The simulations yield results consistent with the empirical findings. Extensive studies indicate the model is robust with respect to the modifications of the minor parts.Comment: 8 pages, 7 figure

    Bidding process in online auctions and winning strategy:rate equation approach

    Full text link
    Online auctions have expanded rapidly over the last decade and have become a fascinating new type of business or commercial transaction in this digital era. Here we introduce a master equation for the bidding process that takes place in online auctions. We find that the number of distinct bidders who bid kk times, called the kk-frequent bidder, up to the tt-th bidding progresses as nk(t)tk2.4n_k(t)\sim tk^{-2.4}. The successfully transmitted bidding rate by the kk-frequent bidder is obtained as qk(t)k1.4q_k(t) \sim k^{-1.4}, independent of tt for large tt. This theoretical prediction is in agreement with empirical data. These results imply that bidding at the last moment is a rational and effective strategy to win in an eBay auction.Comment: 4 pages, 6 figure

    Complex network analysis of literary and scientific texts

    Full text link
    We present results from our quantitative study of statistical and network properties of literary and scientific texts written in two languages: English and Polish. We show that Polish texts are described by the Zipf law with the scaling exponent smaller than the one for the English language. We also show that the scientific texts are typically characterized by the rank-frequency plots with relatively short range of power-law behavior as compared to the literary texts. We then transform the texts into their word-adjacency network representations and find another difference between the languages. For the majority of the literary texts in both languages, the corresponding networks revealed the scale-free structure, while this was not always the case for the scientific texts. However, all the network representations of texts were hierarchical. We do not observe any qualitative and quantitative difference between the languages. However, if we look at other network statistics like the clustering coefficient and the average shortest path length, the English texts occur to possess more clustered structure than do the Polish ones. This result was attributed to differences in grammar of both languages, which was also indicated in the Zipf plots. All the texts, however, show network structure that differs from any of the Watts-Strogatz, the Barabasi-Albert, and the Erdos-Renyi architectures

    Highlighting Current Trends in Volunteered Geographic Information

    Get PDF
    Volunteered Geographic Information (VGI) is a growing area of research. This Special Issue aims to capture the main trends in VGI research based on 16 original papers, and distinguishes between two main areas, i.e., those that deal with the characteristics of VGI and those focused on applications of VGI. The topic of quality assessment and assurance dominates the papers on VGI characteristics, whereas application-oriented work covers three main domains: human behavioral analysis, natural disasters, and land cover/land use mapping. In this Special Issue, therefore, both the challenges and the potentials of VGI are addressed
    corecore