4,921 research outputs found

    Predictability of extreme events in social media

    Get PDF
    It is part of our daily social-media experience that seemingly ordinary items (videos, news, publications, etc.) unexpectedly gain an enormous amount of attention. Here we investigate how unexpected these events are. We propose a method that, given some information on the items, quantifies the predictability of events, i.e., the potential of identifying in advance the most successful items defined as the upper bound for the quality of any prediction based on the same information. Applying this method to different data, ranging from views in YouTube videos to posts in Usenet discussion groups, we invariantly find that the predictability increases for the most extreme events. This indicates that, despite the inherently stochastic collective dynamics of users, efficient prediction is possible for the most extreme events.Comment: 13 pages, 3 figure

    Stochastic dynamics and the predictability of big hits in online videos

    Get PDF
    The competition for the attention of users is a central element of the Internet. Crucial issues are the origin and predictability of big hits, the few items that capture a big portion of the total attention. We address these issues analyzing 10 million time series of videos' views from YouTube. We find that the average gain of views is linearly proportional to the number of views a video already has, in agreement with usual rich-get-richer mechanisms and Gibrat's law, but this fails to explain the prevalence of big hits. The reason is that the fluctuations around the average views are themselves heavy tailed. Based on these empirical observations, we propose a stochastic differential equation with L\'evy noise as a model of the dynamics of videos. We show how this model is substantially better in estimating the probability of an ordinary item becoming a big hit, which is considerably underestimated in the traditional proportional-growth models.Comment: Manuscript (8 pages and 5 figures

    Extracting information from S-curves of language change

    Full text link
    It is well accepted that adoption of innovations are described by S-curves (slow start, accelerating period, and slow end). In this paper, we analyze how much information on the dynamics of innovation spreading can be obtained from a quantitative description of S-curves. We focus on the adoption of linguistic innovations for which detailed databases of written texts from the last 200 years allow for an unprecedented statistical precision. Combining data analysis with simulations of simple models (e.g., the Bass dynamics on complex networks) we identify signatures of endogenous and exogenous factors in the S-curves of adoption. We propose a measure to quantify the strength of these factors and three different methods to estimate it from S-curves. We obtain cases in which the exogenous factors are dominant (in the adoption of German orthographic reforms and of one irregular verb) and cases in which endogenous factors are dominant (in the adoption of conventions for romanization of Russian names and in the regularization of most studied verbs). These results show that the shape of S-curve is not universal and contains information on the adoption mechanism. (published at "J. R. Soc. Interface, vol. 11, no. 101, (2014) 1044"; DOI: http://dx.doi.org/10.1098/rsif.2014.1044)Comment: 9 pages, 5 figures, Supplementary Material is available at http://dx.doi.org/10.6084/m9.figshare.122178

    The Process Manager in the ATLAS DAQ System

    Get PDF
    The Process Manager is the component responsible for launching and controlling processes in the ATLAS DAQ system. The tasks of the Process Manager can be coarsely grouped into three categories: process creation, control and monitoring. Process creation implies the creation of the actual process on behalf of different users and the preparation of all the resources and data needed to actually start the process. Process control includes mostly process termination and UNIX signal dispatching. Process monitoring implies both giving state information on request and initiating call-backs to notify clients that processes have changed states. This paper describes the design and implementation of the DAQ Process Manager for the ATLAS experiment. Since the Process Manager is at the basis of the DAQ control system it must be extremely robust and tolerate the failure of any other DAQ service. Particular emphasis will be given to the testing and quality assurance procedures carried out to validate this component

    2D Zernike polynomial expansion: finding the protein-protein binding regions

    Get PDF
    We present a method for efficiently and effectively assessing whether and where two proteins can interact with each other to form a complex. This is still largely an open problem, even for those relatively few cases where the 3D structure of both proteins is known. In fact, even if much of the information about the interaction is encoded in the chemical and geometric features of the structures, the set of possible contact patches and of their relative orientations are too large to be computationally affordable in a reasonable time, thus preventing the compilation of reliable interactome. Our method is able to rapidly and quantitatively measure the geometrical shape complementarity between interacting proteins, comparing their molecular iso-electron density surfaces expanding the surface patches in term of 2D Zernike polynomials. We first test the method against the real binding region of a large dataset of known protein complexes, reaching a success rate of 0.72. We then apply the method for the blind recognition of binding sites, identifying the real region of interaction in about 60% of the analyzed cases. Finally, we investigate how the efficiency in finding the right binding region depends on the surface roughness as a function of the expansion order

    GSTP1 DNA Methylation and Expression Status Is Indicative of 5-aza-2′-Deoxycytidine Efficacy in Human Prostate Cancer Cells

    Get PDF
    DNA methylation plays an important role in carcinogenesis and the reversibility of this epigenetic modification makes it a potential therapeutic target. To date, DNA methyltransferase inhibitors (DNMTi) have not demonstrated clinical efficacy in prostate cancer, with one of the major obstacles being the inability to monitor drug activity during the trial. Given the high frequency and specificity of GSTP1 DNA methylation in prostate cancer, we investigated whether GSTP1 is a useful marker of DNMTi treatment efficacy. LNCaP prostate cancer cells were treated with 5-aza-2′-deoxycytidine (5-aza-CdR) either with a single high dose (5–20 µM), every alternate day (0.1–10 µM) or daily (0.005–2.5 µM). A daily treatment regimen with 5-aza-CdR was optimal, with significant suppression of cell proliferation achieved with doses of 0.05 µM or greater (p<0.0001) and induction of cell death from 0.5 µM (p<0.0001). In contrast, treatment with a single high dose of 20 µM 5-aza-CdR inhibited cell proliferation but was not able to induce cell death. Demethylation of GSTP1 was observed with doses of 5-aza-CdR that induced significant suppression of cell proliferation (≥0.05 µM). Re-expression of the GSTP1 protein was observed only at doses of 5-aza-CdR (≥0.5 µM) associated with induction of cell death. Treatment of LNCaP cells with a more stable DNMTi, Zebularine required at least a 100-fold higher dose (≥50 µM) to inhibit proliferation and was less potent in inducing cell death, which corresponded to a lack of GSTP1 protein re-expression. We have shown that GSTP1 DNA methylation and protein expression status is correlated with DNMTi treatment response in prostate cancer cells. Since GSTP1 is methylated in nearly all prostate cancers, our results warrant its testing as a marker of epigenetic therapy response in future clinical trials. We conclude that the DNA methylation and protein expression status of GSTP1 are good indicators of DNMTi efficacy

    Mycobacterium tuberculosis drug-resistance testing: challenges, recent developments and perspectives

    Full text link
    Drug-resistance testing, or antimicrobial susceptibility testing (AST), is mandatory for Mycobacterium tuberculosis in cases of failure on standard therapy. We reviewed the different methods and techniques of phenotypic and genotypic approaches. Although multiresistant and extensively drug-resistant (MDR/XDR) tuberculosis is present worldwide, AST for M. tuberculosis (AST-MTB) is still mainly performed according to the resources available rather than the drug-resistance rates. Phenotypic methods, i.e. culture-based AST, are commonly used in high-income countries to confirm susceptibility of new cases of tuberculosis. They are also used to detect resistance in tuberculosis cases with risk factors, in combination with genotypic tests. In low-income countries, genotypic methods screening hot-spot mutations known to confer resistance were found to be easier to perform because they avoid the culture and biosafety constraint. Given that genotypic tests can rapidly detect the prominent mechanisms of resistance, such as the rpoB mutation for rifampicin resistance, we are facing new challenges with the observation of false-resistance (mutations not conferring resistance) and false-susceptibility (mutations different from the common mechanism) results. Phenotypic and genotypic approaches are therefore complementary for obtaining a high sensitivity and specificity for detecting drug resistances and susceptibilities to accurately predict MDR/XDR cure and to gather relevant data for resistance surveillance. Although AST-MTB was established in the 1960s, there is no consensus reference method for MIC determination against which the numerous AST-MTB techniques can be compared. This information is necessary for assessing in vitro activity and setting breakpoints for future anti-tuberculosis agents

    Collective behavior and self-organization in neural rosette morphogenesis

    Get PDF
    Neural rosettes develop from the self-organization of differentiating human pluripotent stem cells. This process mimics the emergence of the embryonic central nervous system primordium, i.e., the neural tube, whose formation is under close investigation as errors during such process result in severe diseases like spina bifida and anencephaly. While neural tube formation is recognized as an example of self-organization, we still do not understand the fundamental mechanisms guiding the process. Here, we discuss the different theoretical frameworks that have been proposed to explain self-organization in morphogenesis. We show that an explanation based exclusively on stem cell differentiation cannot describe the emergence of spatial organization, and an explanation based on patterning models cannot explain how different groups of cells can collectively migrate and produce the mechanical transformations required to generate the neural tube. We conclude that neural rosette development is a relevant experimental 2D in-vitro model of morphogenesis because it is a multi-scale self-organization process that involves both cell differentiation and tissue development. Ultimately, to understand rosette formation, we first need to fully understand the complex interplay between growth, migration, cytoarchitecture organization, and cell type evolution

    miRNA Signatures in Sera of Patients with Active Pulmonary Tuberculosis.

    Get PDF
    Several studies showed that assessing levels of specific circulating microRNAs (miRNAs) is a non-invasive, rapid, and accurate method for diagnosing diseases or detecting alterations in physiological conditions. We aimed to identify a serum miRNA signature to be used for the diagnosis of tuberculosis (TB). To account for variations due to the genetic makeup, we enrolled adults from two study settings in Europe and Africa. The following categories of subjects were considered: healthy (H), active pulmonary TB (PTB), active pulmonary TB, HIV co-infected (PTB/HIV), latent TB infection (LTBI), other pulmonary infections (OPI), and active extra-pulmonary TB (EPTB). Sera from 10 subjects of the same category were pooled and, after total RNA extraction, screened for miRNA levels by TaqMan low-density arrays. After identification of "relevant miRNAs", we refined the serum miRNA signature discriminating between H and PTB on individual subjects. Signatures were analyzed for their diagnostic performances using a multivariate logistic model and a Relevance Vector Machine (RVM) model. A leave-one-out-cross-validation (LOOCV) approach was adopted for assessing how both models could perform in practice. The analysis on pooled specimens identified selected miRNAs as discriminatory for the categories analyzed. On individual serum samples, we showed that 15 miRNAs serve as signature for H and PTB categories with a diagnostic accuracy of 82% (CI 70.2-90.0), and 77% (CI 64.2-85.9) in a RVM and a logistic classification model, respectively. Considering the different ethnicity, by selecting the specific signature for the European group (10 miRNAs) the diagnostic accuracy increased up to 83% (CI 68.1-92.1), and 81% (65.0-90.3), respectively. The African-specific signature (12 miRNAs) increased the diagnostic accuracy up to 95% (CI 76.4-99.1), and 100% (83.9-100.0), respectively. Serum miRNA signatures represent an interesting source of biomarkers for TB disease with the potential to discriminate between PTB and LTBI, but also among the other categories

    Rule-based knowledge aggregation for large-scale protein sequence analysis of influenza A viruses

    Get PDF
    Background: The explosive growth of biological data provides opportunities for new statistical and comparative analyses of large information sets, such as alignments comprising tens of thousands of sequences. In such studies, sequence annotations frequently play an essential role, and reliable results depend on metadata quality. However, the semantic heterogeneity and annotation inconsistencies in biological databases greatly increase the complexity of aggregating and cleaning metadata. Manual curation of datasets, traditionally favoured by life scientists, is impractical for studies involving thousands of records. In this study, we investigate quality issues that affect major public databases, and quantify the effectiveness of an automated metadata extraction approach that combines structural and semantic rules. We applied this approach to more than 90,000 influenza A records, to annotate sequences with protein name, virus subtype, isolate, host, geographic origin, and year of isolation. Results: Over 40,000 annotated Influenza A protein sequences were collected by combining information from more than 90,000 documents from NCBI public databases. Metadata values were automatically extracted, aggregated and reconciled from several document fields by applying user-defined structural rules. For each property, values were recovered from ≥88.8% of records, with accuracy exceeding 96% in most cases. Because of semantic heterogeneity, each property required up to six different structural rules to be combined. Significant quality differences between databases were found: GenBank documents yield values more reliably than documents extracted from GenPept. Using a simple set of semantic rules and a reasoner, we reconstructed relationships between sequences from the same isolate, thus identifying 7640 isolates. Validation of isolate metadata against a simple ontology highlighted more than 400 inconsistencies, leading to over 3,000 property value corrections. Conclusion: To overcome the quality issues inherent in public databases, automated knowledge aggregation with embedded intelligence is needed for large-scale analyses. Our results show that user-controlled intuitive approaches, based on combination of simple rules, can reliably automate various curation tasks, reducing the need for manual corrections to approximately 5% of the records. Emerging semantic technologies possess desirable features to support today's knowledge aggregation tasks, with a potential to bring immediate benefits to this field
    corecore