112,954 research outputs found
Selection of sequence motifs and generative Hopfield-Potts models for protein familiesilies
Statistical models for families of evolutionary related proteins have
recently gained interest: in particular pairwise Potts models, as those
inferred by the Direct-Coupling Analysis, have been able to extract information
about the three-dimensional structure of folded proteins, and about the effect
of amino-acid substitutions in proteins. These models are typically requested
to reproduce the one- and two-point statistics of the amino-acid usage in a
protein family, {\em i.e.}~to capture the so-called residue conservation and
covariation statistics of proteins of common evolutionary origin. Pairwise
Potts models are the maximum-entropy models achieving this. While being
successful, these models depend on huge numbers of {\em ad hoc} introduced
parameters, which have to be estimated from finite amount of data and whose
biophysical interpretation remains unclear. Here we propose an approach to
parameter reduction, which is based on selecting collective sequence motifs. It
naturally leads to the formulation of statistical sequence models in terms of
Hopfield-Potts models. These models can be accurately inferred using a mapping
to restricted Boltzmann machines and persistent contrastive divergence. We show
that, when applied to protein data, even 20-40 patterns are sufficient to
obtain statistically close-to-generative models. The Hopfield patterns form
interpretable sequence motifs and may be used to clusterize amino-acid
sequences into functional sub-families. However, the distributed collective
nature of these motifs intrinsically limits the ability of Hopfield-Potts
models in predicting contact maps, showing the necessity of developing models
going beyond the Hopfield-Potts models discussed here.Comment: 26 pages, 16 figures, to app. in PR
Social Fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling
Spambot detection in online social networks is a long-lasting challenge
involving the study and design of detection techniques capable of efficiently
identifying ever-evolving spammers. Recently, a new wave of social spambots has
emerged, with advanced human-like characteristics that allow them to go
undetected even by current state-of-the-art algorithms. In this paper, we show
that efficient spambots detection can be achieved via an in-depth analysis of
their collective behaviors exploiting the digital DNA technique for modeling
the behaviors of social network users. Inspired by its biological counterpart,
in the digital DNA representation the behavioral lifetime of a digital account
is encoded in a sequence of characters. Then, we define a similarity measure
for such digital DNA sequences. We build upon digital DNA and the similarity
between groups of users to characterize both genuine accounts and spambots.
Leveraging such characterization, we design the Social Fingerprinting
technique, which is able to discriminate among spambots and genuine accounts in
both a supervised and an unsupervised fashion. We finally evaluate the
effectiveness of Social Fingerprinting and we compare it with three
state-of-the-art detection algorithms. Among the peculiarities of our approach
is the possibility to apply off-the-shelf DNA analysis techniques to study
online users behaviors and to efficiently rely on a limited number of
lightweight account characteristics
Circadian patterns of Wikipedia editorial activity: A demographic analysis
Wikipedia (WP) as a collaborative, dynamical system of humans is an
appropriate subject of social studies. Each single action of the members of
this society, i.e. editors, is well recorded and accessible. Using the
cumulative data of 34 Wikipedias in different languages, we try to characterize
and find the universalities and differences in temporal activity patterns of
editors. Based on this data, we estimate the geographical distribution of
editors for each WP in the globe. Furthermore we also clarify the differences
among different groups of WPs, which originate in the variance of cultural and
social features of the communities of editors
Learning activity progression in LSTMs for activity detection and early detection
In this work we improve training of temporal deep models to better learn activity progression for activity detection and early detection tasks. Conventionally, when training a Recurrent Neural Network, specifically a Long Short Term Memory (LSTM) model, the training loss only considers classification error. However, we argue that the detection score of the correct activity category, or the detection score margin between the correct and incorrect categories, should be monotonically non-decreasing as the model observes more of the activity. We design novel ranking losses that directly penalize the model on violation of such monotonicities, which are used together with classification loss in training of LSTM models. Evaluation on ActivityNet shows significant benefits of the proposed ranking losses in both activity detection and early detection tasks.https://www.cv-foundation.org/openaccess/content_cvpr_2016/html/Ma_Learning_Activity_Progression_CVPR_2016_paper.htmlPublished versio
- …