33 research outputs found
Wikipedia Information Flow Analysis Reveals the Scale-Free Architecture of the Semantic Space
In this paper we extract the topology of the semantic space in its encyclopedic acception, measuring the semantic flow between the different entries of the largest modern encyclopedia, Wikipedia, and thus creating a directed complex network of semantic flows. Notably at the percolation threshold the semantic space is characterised by scale-free behaviour at different levels of complexity and this relates the semantic space to a wide range of biological, social and linguistics phenomena. In particular we find that the cluster size distribution, representing the size of different semantic areas, is scale-free. Moreover the topology of the resulting semantic space is scale-free in the connectivity distribution and displays small-world properties. However its statistical properties do not allow a classical interpretation via a generative model based on a simple multiplicative process. After giving a detailed description and interpretation of the topological properties of the semantic space, we introduce a stochastic model of content-based network, based on a copy and mutation algorithm and on the Heaps' law, that is able to capture the main statistical properties of the analysed semantic space, including the Zipf's law for the word frequency distribution
Modeling Statistical Properties of Written Text
Written text is one of the fundamental manifestations of human language, and the study of its universal regularities can give clues about how our brains process information and how we, as a society, organize and share it. Among these regularities, only Zipf's law has been explored in depth. Other basic properties, such as the existence of bursts of rare words in specific documents, have only been studied independently of each other and mainly by descriptive models. As a consequence, there is a lack of understanding of linguistic processes as complex emergent phenomena. Beyond Zipf's law for word frequencies, here we focus on burstiness, Heaps' law describing the sublinear growth of vocabulary size with the length of a document, and the topicality of document collections, which encode correlations within and across documents absent in random null models. We introduce and validate a generative model that explains the simultaneous emergence of all these patterns from simple rules. As a result, we find a connection between the bursty nature of rare words and the topical organization of texts and identify dynamic word ranking and memory across documents as key mechanisms explaining the non trivial organization of written text. Our research can have broad implications and practical applications in computer science, cognitive science and linguistics
Evolution of scaling emergence in large-scale spatial epidemic spreading
Background: Zipf's law and Heaps' law are two representatives of the scaling
concepts, which play a significant role in the study of complexity science. The
coexistence of the Zipf's law and the Heaps' law motivates different
understandings on the dependence between these two scalings, which is still
hardly been clarified.
Methodology/Principal Findings: In this article, we observe an evolution
process of the scalings: the Zipf's law and the Heaps' law are naturally shaped
to coexist at the initial time, while the crossover comes with the emergence of
their inconsistency at the larger time before reaching a stable state, where
the Heaps' law still exists with the disappearance of strict Zipf's law. Such
findings are illustrated with a scenario of large-scale spatial epidemic
spreading, and the empirical results of pandemic disease support a universal
analysis of the relation between the two laws regardless of the biological
details of disease. Employing the United States(U.S.) domestic air
transportation and demographic data to construct a metapopulation model for
simulating the pandemic spread at the U.S. country level, we uncover that the
broad heterogeneity of the infrastructure plays a key role in the evolution of
scaling emergence.
Conclusions/Significance: The analyses of large-scale spatial epidemic
spreading help understand the temporal evolution of scalings, indicating the
coexistence of the Zipf's law and the Heaps' law depends on the collective
dynamics of epidemic processes, and the heterogeneity of epidemic spread
indicates the significance of performing targeted containment strategies at the
early time of a pandemic disease.Comment: 24pages, 7figures, accepted by PLoS ON
Coherent oscillations in word-use data from 1700 to 2008
In written language, the choice of specific words is constrained by both grammatical requirements and the specific semantic context of the message to be transmitted. To a significant degree, the semantic context is in turn affected by a broad cultural and historical environment, which also influences matters of style and manners. Over time, those environmental factors leave an imprint in the statistics of language use, with some words becoming more common and other words being preferred less. Here we characterize the patterns of language use over time based on word statistics extracted from more than 4.5 million books written over a period of 308 years. We find evidence of novel systematic oscillatory patterns in word use with a consistent period narrowly distributed around 14 years. The specific phase relationships between different words show structure at two independent levels: first, there is a weak global phase modulation that is primarily linked to overall shifts in the vocabulary across time; and second, a stronger component dependent on well defined semantic relationships between words. In particular, complex network analysis reveals that semantically related words show strong phase coherence. Ultimately, these previously unknown patterns in the statistics of language may be a consequence of changes in the cultural framework that influences the thematic focus of writers
Universal entropy of word ordering across linguistic families
Background
The language faculty is probably the most distinctive feature of our species, and endows us with a unique ability to exchange highly structured information. In written language, information is encoded by the concatenation of basic symbols under grammatical and semantic constraints. As is also the case in other natural information carriers, the resulting symbolic sequences show a delicate balance between order and disorder. That balance is determined by the interplay between the diversity of symbols and by their specific ordering in the sequences. Here we used entropy to quantify the contribution of different organizational levels to the overall statistical structure of language.
Methodology/Principal Findings
We computed a relative entropy measure to quantify the degree of ordering in word sequences from languages belonging to several linguistic families. While a direct estimation of the overall entropy of language yielded values that varied for the different families considered, the relative entropy quantifying word ordering presented an almost constant value for all those families.
Conclusions/Significance
Our results indicate that despite the differences in the structure and vocabulary of the languages analyzed, the impact of word ordering in the structure of language is a statistical linguistic universal
Inflammatory Gene Regulatory Networks in Amnion Cells Following Cytokine Stimulation: Translational Systems Approach to Modeling Human Parturition
A majority of the studies examining the molecular regulation of human labor have
been conducted using single gene approaches. While the technology to produce
multi-dimensional datasets is readily available, the means for facile analysis
of such data are limited. The objective of this study was to develop a systems
approach to infer regulatory mechanisms governing global gene expression in
cytokine-challenged cells in vitro, and to apply these methods
to predict gene regulatory networks (GRNs) in intrauterine tissues during term
parturition. To this end, microarray analysis was applied to human amnion
mesenchymal cells (AMCs) stimulated with interleukin-1β, and differentially
expressed transcripts were subjected to hierarchical clustering, temporal
expression profiling, and motif enrichment analysis, from which a GRN was
constructed. These methods were then applied to fetal membrane specimens
collected in the absence or presence of spontaneous term labor. Analysis of
cytokine-responsive genes in AMCs revealed a sterile immune response signature,
with promoters enriched in response elements for several inflammation-associated
transcription factors. In comparison to the fetal membrane dataset, there were
34 genes commonly upregulated, many of which were part of an acute inflammation
gene expression signature. Binding motifs for nuclear factor-κB were
prominent in the gene interaction and regulatory networks for both datasets;
however, we found little evidence to support the utilization of
pathogen-associated molecular pattern (PAMP) signaling. The tissue specimens
were also enriched for transcripts governed by hypoxia-inducible factor. The
approach presented here provides an uncomplicated means to infer global
relationships among gene clusters involved in cellular responses to
labor-associated signals
Significance and popularity in music production
Creative industries constantly strive for fame and popularity. Though highly desirable, popularity is not the only achievement artistic creations might ever acquire. Leaving a longstanding mark in the global production and influencing future works is an even more important achievement, usually acknowledged by experts and scholars. ‘Significant’ or ‘influential’ works are not always well known to the public or have sometimes been long forgotten by the vast majority. In this paper, we focus on the duality between what is successful and what is significant in the musical context. To this end, we consider a user-generated set of tags collected through an online music platform, whose evolving co-occurrence network mirrors the growing conceptual space underlying music production. We define a set of general metrics aiming at characterizing music albums throughout history, and their relationships with the overall musical production. We show how these metrics allow to classify albums according to their current popularity or their belonging to expert-made lists of important albums. In this way, we provide the scientific community and the public at large with quantitative tools to tell apart popular albums from culturally or aesthetically relevant artworks. The generality of the methodology presented here lends itself to be used in all those fields where innovation and creativity are in play