112,954 research outputs found

    Selection of sequence motifs and generative Hopfield-Potts models for protein familiesilies

    Full text link
    Statistical models for families of evolutionary related proteins have recently gained interest: in particular pairwise Potts models, as those inferred by the Direct-Coupling Analysis, have been able to extract information about the three-dimensional structure of folded proteins, and about the effect of amino-acid substitutions in proteins. These models are typically requested to reproduce the one- and two-point statistics of the amino-acid usage in a protein family, {\em i.e.}~to capture the so-called residue conservation and covariation statistics of proteins of common evolutionary origin. Pairwise Potts models are the maximum-entropy models achieving this. While being successful, these models depend on huge numbers of {\em ad hoc} introduced parameters, which have to be estimated from finite amount of data and whose biophysical interpretation remains unclear. Here we propose an approach to parameter reduction, which is based on selecting collective sequence motifs. It naturally leads to the formulation of statistical sequence models in terms of Hopfield-Potts models. These models can be accurately inferred using a mapping to restricted Boltzmann machines and persistent contrastive divergence. We show that, when applied to protein data, even 20-40 patterns are sufficient to obtain statistically close-to-generative models. The Hopfield patterns form interpretable sequence motifs and may be used to clusterize amino-acid sequences into functional sub-families. However, the distributed collective nature of these motifs intrinsically limits the ability of Hopfield-Potts models in predicting contact maps, showing the necessity of developing models going beyond the Hopfield-Potts models discussed here.Comment: 26 pages, 16 figures, to app. in PR

    Social Fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling

    Full text link
    Spambot detection in online social networks is a long-lasting challenge involving the study and design of detection techniques capable of efficiently identifying ever-evolving spammers. Recently, a new wave of social spambots has emerged, with advanced human-like characteristics that allow them to go undetected even by current state-of-the-art algorithms. In this paper, we show that efficient spambots detection can be achieved via an in-depth analysis of their collective behaviors exploiting the digital DNA technique for modeling the behaviors of social network users. Inspired by its biological counterpart, in the digital DNA representation the behavioral lifetime of a digital account is encoded in a sequence of characters. Then, we define a similarity measure for such digital DNA sequences. We build upon digital DNA and the similarity between groups of users to characterize both genuine accounts and spambots. Leveraging such characterization, we design the Social Fingerprinting technique, which is able to discriminate among spambots and genuine accounts in both a supervised and an unsupervised fashion. We finally evaluate the effectiveness of Social Fingerprinting and we compare it with three state-of-the-art detection algorithms. Among the peculiarities of our approach is the possibility to apply off-the-shelf DNA analysis techniques to study online users behaviors and to efficiently rely on a limited number of lightweight account characteristics

    Circadian patterns of Wikipedia editorial activity: A demographic analysis

    Get PDF
    Wikipedia (WP) as a collaborative, dynamical system of humans is an appropriate subject of social studies. Each single action of the members of this society, i.e. editors, is well recorded and accessible. Using the cumulative data of 34 Wikipedias in different languages, we try to characterize and find the universalities and differences in temporal activity patterns of editors. Based on this data, we estimate the geographical distribution of editors for each WP in the globe. Furthermore we also clarify the differences among different groups of WPs, which originate in the variance of cultural and social features of the communities of editors

    Learning activity progression in LSTMs for activity detection and early detection

    Full text link
    In this work we improve training of temporal deep models to better learn activity progression for activity detection and early detection tasks. Conventionally, when training a Recurrent Neural Network, specifically a Long Short Term Memory (LSTM) model, the training loss only considers classification error. However, we argue that the detection score of the correct activity category, or the detection score margin between the correct and incorrect categories, should be monotonically non-decreasing as the model observes more of the activity. We design novel ranking losses that directly penalize the model on violation of such monotonicities, which are used together with classification loss in training of LSTM models. Evaluation on ActivityNet shows significant benefits of the proposed ranking losses in both activity detection and early detection tasks.https://www.cv-foundation.org/openaccess/content_cvpr_2016/html/Ma_Learning_Activity_Progression_CVPR_2016_paper.htmlPublished versio
    • …
    corecore