Search CORE

59,485 research outputs found

Social Fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling

Author: Cresci Stefano
Di Pietro Roberto
Petrocchi Marinella
Spognardi Angelo
Tesconi Maurizio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Spambot detection in online social networks is a long-lasting challenge involving the study and design of detection techniques capable of efficiently identifying ever-evolving spammers. Recently, a new wave of social spambots has emerged, with advanced human-like characteristics that allow them to go undetected even by current state-of-the-art algorithms. In this paper, we show that efficient spambots detection can be achieved via an in-depth analysis of their collective behaviors exploiting the digital DNA technique for modeling the behaviors of social network users. Inspired by its biological counterpart, in the digital DNA representation the behavioral lifetime of a digital account is encoded in a sequence of characters. Then, we define a similarity measure for such digital DNA sequences. We build upon digital DNA and the similarity between groups of users to characterize both genuine accounts and spambots. Leveraging such characterization, we design the Social Fingerprinting technique, which is able to discriminate among spambots and genuine accounts in both a supervised and an unsupervised fashion. We finally evaluate the effectiveness of Social Fingerprinting and we compare it with three state-of-the-art detection algorithms. Among the peculiarities of our approach is the possibility to apply off-the-shelf DNA analysis techniques to study online users behaviors and to efficiently rely on a limited number of lightweight account characteristics

arXiv.org e-Print Archive

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Online Research Database In Technology

Fundamental principles in drawing inference from sequence analysis

Author: King Tom
Publication venue: Southampton Statistical Sciences Research Institute, University of Southampton
Publication date: 15/03/2010
Field of study

Individual life courses are dynamic and can be represented as a sequence of states for some portion of their experiences. More generally, study of such sequences has been made in many fields around social science; for example, sociology, linguistics, psychology, and the conceptualisation of subjects progressing through a sequence of states is common. However, many models and sets of data allow only for the treatment of aggregates or transitions, rather than interpreting whole sequences. The temporal aspect of the analysis is fundamental to any inference about the evolution of the subjects but assumptions about time are not normally made explicit. Moreover, without a clear idea of what sequences look like, it is impossible to determine when something is not seen whether it was not actually there. Some principles are proposed which link the ideas of sequences, hypothesis, analytical framework, categorisation and representation; each one being underpinned by the consideration of time. To make inferences about sequences, one needs to: understand what these sequences represent; the hypothesis and assumptions that can be derived about sequences; identify the categories within the sequences; and data representation at each stage. These ideas are obvious in themselves but they are interlinked, imposing restrictions on each other and on the inferences which can be draw

Southampton (e-Prints Soton)

Back-translation for discovering distant protein homologies

Author: A. Pedersen
B. Oostra
C. Kosiol
J. Leluk
J. Leluk
J. Raes
K. Okamura
L. Arvestad
L. Delaye
M. Clamp
M. Pellegrini
P. Harrison
P. Lio
R. Blake
S. Altschul
S. Altschul
S. Altschul
Y. Hahn
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Frameshift mutations in protein-coding DNA sequences produce a drastic change in the resulting protein sequence, which prevents classic protein alignment methods from revealing the proteins' common origin. Moreover, when a large number of substitutions are additionally involved in the divergence, the homology detection becomes difficult even at the DNA level. To cope with this situation, we propose a novel method to infer distant homology relations of two proteins, that accounts for frameshift and point mutations that may have affected the coding sequences. We design a dynamic programming alignment algorithm over memory-efficient graph representations of the complete set of putative DNA sequences of each protein, with the goal of determining the two putative DNA sequences which have the best scoring alignment under a powerful scoring system designed to reflect the most probable evolutionary process. This allows us to uncover evolutionary information that is not captured by traditional alignment methods, which is confirmed by biologically significant examples.Comment: The 9th International Workshop in Algorithms in Bioinformatics (WABI), Philadelphia : \'Etats-Unis d'Am\'erique (2009

arXiv.org e-Print Archive

CiteSeerX

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

Information profiles for DNA pattern discovery

Author: Ferreira Paulo J. S. G.
Pinho Armando J.
Pratas Diogo
Publication venue
Publication date: 19/01/2014
Field of study

Finite-context modeling is a powerful tool for compressing and hence for representing DNA sequences. We describe an algorithm to detect genomic regularities, within a blind discovery strategy. The algorithm uses information profiles built using suitable combinations of finite-context models. We used the genome of the fission yeast Schizosaccharomyces pombe strain 972 h- for illustration, unveilling locations of low information content, which are usually associated with DNA regions of potential biological interest.Comment: Full version of DCC 2014 paper "Information profiles for DNA pattern discovery

arXiv.org e-Print Archive

Crossref

Extending colonic mucosal microbiome analysis - Assessment of colonic lavage as a proxy for endoscopic colonic biopsies

Author: A Durban
A Jain
AC Ouwehand
AD Kostic
AD Kostic
B Willing
CL O’Brien
E Pruesse
EG Zoetendal
EH Simpson
F Backhed
F Chierico Del
G Li
GL Hold
HJ Flint
HL Cash
I Mukhopadhya
I Rangel
J Handelsman
J Jalanka
J Qin
JJ Kozich
JM Choo
JR Marchesi
L Chen
L Drago
L Harrell
M Morotomi
MG Langille
MH McLean
N Segata
NA Kennedy
P Lepage
P Louis
PB Eckburg
PD Schloss
PJ Turnbaugh
R Bibiloni
R Hansen
RE Ley
RL Warren
RM Shobar
S Delgado
SJ Salter
T Vatanen
Team RC
TZ DeSantis
V Mai
Y Momozawa
Y Xie
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/11/2016
Field of study

This study was supported through GI Research funds and MRC Grant Ref: MR/M00533X/1 to GH.Peer reviewedPublisher PD

Aberdeen University Research

Crossref

Springer - Publisher Connector

PubMed Central

UNSWorks

FigShare

A MOSAIC of methods: Improving ortholog detection through integration of algorithmic diversity

Author: Hernandez Ryan D.
Maher M. Cyrus
Publication venue
Publication date: 18/04/2014
Field of study

Ortholog detection (OD) is a critical step for comparative genomic analysis of protein-coding sequences. In this paper, we begin with a comprehensive comparison of four popular, methodologically diverse OD methods: MultiParanoid, Blat, Multiz, and OMA. In head-to-head comparisons, these methods are shown to significantly outperform one another 12-30% of the time. This high complementarity motivates the presentation of the first tool for integrating methodologically diverse OD methods. We term this program MOSAIC, or Multiple Orthologous Sequence Analysis and Integration by Cluster optimization. Relative to component and competing methods, we demonstrate that MOSAIC more than quintuples the number of alignments for which all species are present, while simultaneously maintaining or improving functional-, phylogenetic-, and sequence identity-based measures of ortholog quality. Further, we demonstrate that this improvement in alignment quality yields 40-280% more confidently aligned sites. Combined, these factors translate to higher estimated levels of overall conservation, while at the same time allowing for the detection of up to 180% more positively selected sites. MOSAIC is available as python package. MOSAIC alignments, source code, and full documentation are available at http://pythonhosted.org/bio-MOSAIC

arXiv.org e-Print Archive

FigShare

Understanding diversity of human innate immunity receptors: analysis of surface features of leucine-rich repeat domains in NLRs and TLRs.

Author: Godzik Adam
Istomin Andrei Y
Publication venue: eScholarship, University of California
Publication date: 01/09/2009
Field of study

BackgroundThe human innate immune system uses a system of extracellular Toll-like receptors (TLRs) and intracellular Nod-like receptors (NLRs) to match the appropriate level of immune response to the level of threat from the current environment. Almost all NLRs and TLRs have a domain consisting of multiple leucine-rich repeats (LRRs), which is believed to be involved in ligand binding. LRRs, found also in thousands of other proteins, form a well-defined "horseshoe"-shaped structural scaffold that can be used for a variety of functions, from binding specific ligands to performing a general structural role. The specific functional roles of LRR domains in NLRs and TLRs are thus defined by their detailed surface features. While experimental crystal structures of four human TLRs have been solved, no structure data are available for NLRs.ResultsWe report a quantitative, comparative analysis of the surface features of LRR domains in human NLRs and TLRs, using predicted three-dimensional structures for NLRs. Specifically, we calculated amino acid hydrophobicity, charge, and glycosylation distributions within LRR domain surfaces and assessed their similarity by clustering. Despite differences in structural and genomic organization, comparison of LRR surface features in NLRs and TLRs allowed us to hypothesize about their possible functional similarities. We find agreement between predicted surface similarities and similar functional roles in NLRs and TLRs with known agonists, and suggest possible binding partners for uncharacterized NLRs.ConclusionDespite its low resolution, our approach permits comparison of molecular surface features in the absence of crystal structure data. Our results illustrate diversity of surface features of innate immunity receptors and provide hints for function of NLRs whose specific role in innate immunity is yet unknown

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California