101 research outputs found

    On the origin of ambiguity in efficient communication

    Full text link
    This article studies the emergence of ambiguity in communication through the concept of logical irreversibility and within the framework of Shannon's information theory. This leads us to a precise and general expression of the intuition behind Zipf's vocabulary balance in terms of a symmetry equation between the complexities of the coding and the decoding processes that imposes an unavoidable amount of logical uncertainty in natural communication. Accordingly, the emergence of irreversible computations is required if the complexities of the coding and the decoding processes are balanced in a symmetric scenario, which means that the emergence of ambiguous codes is a necessary condition for natural communication to succeed.Comment: 28 pages, 2 figure

    Testing the robustness of laws of polysemy and brevity versus frequency

    Get PDF
    The pioneering research of G.K. Zipf on the relationship between word frequency and other word features led to the formulation of various linguistic laws. Here we focus on a couple of them: the meaning-frequency law, i.e. the tendency of more frequent words to be more polysemous, and the law of abbreviation, i.e. the tendency of more frequent words to be shorter. Here we evaluate the robustness of these laws in contexts where they have not been explored yet to our knowledge. The recovery of the laws again in new conditions provides support for the hypothesis that they originate from abstract mechanisms.Peer ReviewedPostprint (author's final draft

    Folksonomies and clustering in the collaborative system CiteULike

    Full text link
    We analyze CiteULike, an online collaborative tagging system where users bookmark and annotate scientific papers. Such a system can be naturally represented as a tripartite graph whose nodes represent papers, users and tags connected by individual tag assignments. The semantics of tags is studied here, in order to uncover the hidden relationships between tags. We find that the clustering coefficient reflects the semantical patterns among tags, providing useful ideas for the designing of more efficient methods of data classification and spam detection.Comment: 9 pages, 5 figures, iop style; corrected typo

    Point-occurrence self-similarity in crackling-noise systems and in other complex systems

    Full text link
    It has been recently found that a number of systems displaying crackling noise also show a remarkable behavior regarding the temporal occurrence of successive events versus their size: a scaling law for the probability distributions of waiting times as a function of a minimum size is fulfilled, signaling the existence on those systems of self-similarity in time-size. This property is also present in some non-crackling systems. Here, the uncommon character of the scaling law is illustrated with simple marked renewal processes, built by definition with no correlations. Whereas processes with a finite mean waiting time do not fulfill a scaling law in general and tend towards a Poisson process in the limit of very high sizes, processes without a finite mean tend to another class of distributions, characterized by double power-law waiting-time densities. This is somehow reminiscent of the generalized central limit theorem. A model with short-range correlations is not able to escape from the attraction of those limit distributions. A discussion on open problems in the modeling of these properties is provided.Comment: Submitted to J. Stat. Mech. for the proceedings of UPON 2008 (Lyon), topic: crackling nois

    Coherent oscillations in word-use data from 1700 to 2008

    Get PDF
    In written language, the choice of specific words is constrained by both grammatical requirements and the specific semantic context of the message to be transmitted. To a significant degree, the semantic context is in turn affected by a broad cultural and historical environment, which also influences matters of style and manners. Over time, those environmental factors leave an imprint in the statistics of language use, with some words becoming more common and other words being preferred less. Here we characterize the patterns of language use over time based on word statistics extracted from more than 4.5 million books written over a period of 308 years. We find evidence of novel systematic oscillatory patterns in word use with a consistent period narrowly distributed around 14 years. The specific phase relationships between different words show structure at two independent levels: first, there is a weak global phase modulation that is primarily linked to overall shifts in the vocabulary across time; and second, a stronger component dependent on well defined semantic relationships between words. In particular, complex network analysis reveals that semantically related words show strong phase coherence. Ultimately, these previously unknown patterns in the statistics of language may be a consequence of changes in the cultural framework that influences the thematic focus of writers

    Zipf's Law in Short-Time Timbral Codings of Speech, Music, and Environmental Sound Signals

    Get PDF
    Timbre is a key perceptual feature that allows discrimination between different sounds. Timbral sensations are highly dependent on the temporal evolution of the power spectrum of an audio signal. In order to quantitatively characterize such sensations, the shape of the power spectrum has to be encoded in a way that preserves certain physical and perceptual properties. Therefore, it is common practice to encode short-time power spectra using psychoacoustical frequency scales. In this paper, we study and characterize the statistical properties of such encodings, here called timbral code-words. In particular, we report on rank-frequency distributions of timbral code-words extracted from 740 hours of audio coming from disparate sources such as speech, music, and environmental sounds. Analogously to text corpora, we find a heavy-tailed Zipfian distribution with exponent close to one. Importantly, this distribution is found independently of different encoding decisions and regardless of the audio source. Further analysis on the intrinsic characteristics of most and least frequent code-words reveals that the most frequent code-words tend to have a more homogeneous structure. We also find that speech and music databases have specific, distinctive code-words while, in the case of the environmental sounds, this database-specific code-words are not present. Finally, we find that a Yule-Simon process with memory provides a reasonable quantitative approximation for our data, suggesting the existence of a common simple generative mechanism for all considered sound sources

    Prescriptions of Traditional Chinese Medicine Are Specific to Cancer Types and Adjustable to Temperature Changes

    Get PDF
    Targeted cancer therapies, with specific molecular targets, ameliorate the side effect issue of radiation and chemotherapy and also point to the development of personalized medicine. Combination of drugs targeting multiple pathways of carcinogenesis is potentially more fruitful. Traditional Chinese medicine (TCM) has been tailoring herbal mixtures for individualized healthcare for two thousand years. A systematic study of the patterns of TCM formulas and herbs prescribed to cancers is valuable. We analysed a total of 187,230 TCM prescriptions to 30 types of cancer in Taiwan in 2007, a year's worth of collection from the National Health Insurance reimbursement database (Taiwan). We found that a TCM cancer prescription consists on average of two formulas and four herbs. We show that the percentage weights of TCM formulas and herbs in a TCM prescription follow Zipf's law with an exponent around 0.6. TCM prescriptions to benign neoplasms have a larger Zipf's exponent than those to malignant cancers. Furthermore, we show that TCM prescriptions, via weighted combination of formulas and herbs, are specific to not only the malignancy of neoplasms but also the sites of origins of malignant cancers. From the effects of formulas and natures of herbs that were heavily prescribed to cancers, that cancers are a ‘warm and stagnant’ syndrome in TCM can be proposed, suggesting anti-inflammatory regimens for better prevention and treatment of cancers. We show that TCM incorporated relevant formulas to the prescriptions to cancer patients with a secondary morbidity. We compared TCM prescriptions made in different seasons and identified temperatures as the environmental factor that correlates with changes in TCM prescriptions in Taiwan. Lung cancer patients were among the patients whose prescriptions were adjusted when temperatures drop. The findings of our study provide insight to TCM cancer treatment, helping dialogue between modern western medicine and TCM for better cancer care

    Speech Graphs Provide a Quantitative Measure of Thought Disorder in Psychosis

    Get PDF
    Background: Psychosis has various causes, including mania and schizophrenia. Since the differential diagnosis of psychosis is exclusively based on subjective assessments of oral interviews with patients, an objective quantification of the speech disturbances that characterize mania and schizophrenia is in order. In principle, such quantification could be achieved by the analysis of speech graphs. A graph represents a network with nodes connected by edges; in speech graphs, nodes correspond to words and edges correspond to semantic and grammatical relationships. Methodology/Principal Findings: To quantify speech differences related to psychosis, interviews with schizophrenics, manics and normal subjects were recorded and represented as graphs. Manics scored significantly higher than schizophrenics in ten graph measures. Psychopathological symptoms such as logorrhea, poor speech, and flight of thoughts were grasped by the analysis even when verbosity differences were discounted. Binary classifiers based on speech graph measures sorted schizophrenics from manics with up to 93.8% of sensitivity and 93.7% of specificity. In contrast, sorting based on the scores of two standard psychiatric scales (BPRS and PANSS) reached only 62.5% of sensitivity and specificity. Conclusions/Significance: The results demonstrate that alterations of the thought process manifested in the speech of psychotic patients can be objectively measured using graph-theoretical tools, developed to capture specific features of the normal and dysfunctional flow of thought, such as divergence and recurrence. The quantitative analysis of speech graphs is not redundant with standard psychometric scales but rather complementary, as it yields a very accurate sorting of schizophrenics and manics. Overall, the results point to automated psychiatric diagnosis based not on what is said, but on how it is said.FINEP [01.06.1092.00]FINEPCNPq Universal [481506/2007-1]CNPq UniversalCNPqCNPqCapesCAPESad Associacao Alberto Santos Dumont para Apoio a Pesquisa (AASDAP)a'd Associacao Alberto Santos Dumont para Apoio a Pesquisa (AASDAP
    corecore