31,178 research outputs found
Social Fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling
Spambot detection in online social networks is a long-lasting challenge
involving the study and design of detection techniques capable of efficiently
identifying ever-evolving spammers. Recently, a new wave of social spambots has
emerged, with advanced human-like characteristics that allow them to go
undetected even by current state-of-the-art algorithms. In this paper, we show
that efficient spambots detection can be achieved via an in-depth analysis of
their collective behaviors exploiting the digital DNA technique for modeling
the behaviors of social network users. Inspired by its biological counterpart,
in the digital DNA representation the behavioral lifetime of a digital account
is encoded in a sequence of characters. Then, we define a similarity measure
for such digital DNA sequences. We build upon digital DNA and the similarity
between groups of users to characterize both genuine accounts and spambots.
Leveraging such characterization, we design the Social Fingerprinting
technique, which is able to discriminate among spambots and genuine accounts in
both a supervised and an unsupervised fashion. We finally evaluate the
effectiveness of Social Fingerprinting and we compare it with three
state-of-the-art detection algorithms. Among the peculiarities of our approach
is the possibility to apply off-the-shelf DNA analysis techniques to study
online users behaviors and to efficiently rely on a limited number of
lightweight account characteristics
Inter- and intracontinental migrations and local differentiation have shaped the contemporary epidemiological landscape of canine parvovirus in South America
Canine parvovirus (CPV) is a fast-evolving single-stranded DNA virus that causes one of the most significant infectious diseasesof dogs. Although the virus dispersed over long distances in the past, current populations are considered to be spatiallyconfined and with only a few instances of migration between specific localities. It is unclear whether these dynamicsoccur in South America where global studies have not been performed. The aim of this study is to analyze the patterns ofgenetic variability in South American CPV populations and explore their evolutionary relationships with global strains.Genomic sequences of sixty-three strains from South America and Europe were generated and analyzed using a phylodynamicapproach. All the obtained strains belong to the CPV-2a lineage and associate with global strains in four monophyleticgroups or clades. European and South American strains from all the countries here analyzed are representative of awidely distributed clade (Eur-I) that emerged in Southern Europe during 1990?98 to later spread to South America in theearly 2000s. The emergence and spread of the Eur-I clade were correlated with a significant rise in the CPV effective populationsize in Europe and South America. The Asia-I clade includes strains from Asia and Uruguay. This clade originated in Asia during the late 1980s and evolved locally before spreading to South America during 2009?10. The third clade (Eur-II)comprises strains from Italy, Brazil, and Ecuador. This clade appears in South America as a consequence of an early introductionfrom Italy to Ecuador in the middle 1980s and has experienced extensive local genetic differentiation. Some strainsfrom Argentina, Uruguay, and Brazil constitute an exclusive South American clade (SA-I) that emerged in Argentina in the1990s. These results indicate that the current epidemiological scenario is a consequence of inter- and intracontinentalmigrations of strains with different geographic and temporal origins that set the conditions for competition and local differentiationof CPV populations. The coexistence and interaction of highly divergent strains are the main responsible for thedrastic epidemiological changes observed in South America in the last two decades. This highlights the threat of invasionfrom external sources and the importance of whole-genome resolution to robustly infer the origin and spread of new CPVvariants. From a taxonomic standpoint, the findings herein show that the classification system that uses a single aminoacid to identify variants (2a, 2b, and 2c) within the CPV-2a lineage does not reflect phylogenetic relationships and is not suitableto analyze CPV evolution. In this regard, the identification of clades or sublineages within circulating CPV strains is thefirst step towards a genetic and evolutionary classification of the virus.Fil: Grecco, Sofia. Universidad de la República; UruguayFil: Iraola, Gregorio. Universidad de la República; UruguayFil: Decaro, Nicola. Università degli Studi di Bari; ItaliaFil: Alfieri, Alice. Universidade Estadual de Londrina; BrasilFil: Alfieri, Amauri. Universidade Estadual de Londrina; BrasilFil: Gallo Calderon, Marina Beatriz. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Ciencia y Tecnología "Dr. César Milstein". Fundación Pablo Cassará. Instituto de Ciencia y Tecnología "Dr. César Milstein"; ArgentinaFil: da Silva, Ana Paula. Universidade Estadual de Londrina; BrasilFil: Name, Daniela. Universidad de la República. Facultad de Ciencias; UruguayFil: Aldaz, Jaime. Universidad Estatal de Bolivar; EcuadorFil: Calleros, Lucia. Universidad de la República. Facultad de Ciencias; UruguayFil: Marandino, Ana. Universidad de la República. Facultad de Ciencias; UruguayFil: Gonzalo, Tomas. Universidad de la República. Facultad de Ciencias; Urugua
Kernel methods in genomics and computational biology
Support vector machines and kernel methods are increasingly popular in
genomics and computational biology, due to their good performance in real-world
applications and strong modularity that makes them suitable to a wide range of
problems, from the classification of tumors to the automatic annotation of
proteins. Their ability to work in high dimension, to process non-vectorial
data, and the natural framework they provide to integrate heterogeneous data
are particularly relevant to various problems arising in computational biology.
In this chapter we survey some of the most prominent applications published so
far, highlighting the particular developments in kernel methods triggered by
problems in biology, and mention a few promising research directions likely to
expand in the future
BOOL-AN: A method for comparative sequence analysis and phylogenetic reconstruction
A novel discrete mathematical approach is proposed as an additional tool for molecular systematics which does not require prior statistical assumptions concerning the evolutionary process. The method is based on algorithms generating mathematical representations directly from DNA/RNA or protein sequences, followed by the output of numerical (scalar or vector) and visual characteristics (graphs). The binary encoded sequence information is transformed into a compact analytical form, called the Iterative Canonical Form (or ICF) of Boolean functions, which can then be used as a generalized molecular descriptor. The method provides raw vector data for calculating different distance matrices, which in turn can be analyzed by neighbor-joining or UPGMA to derive a phylogenetic tree, or by principal coordinates analysis to get an ordination scattergram. The new method and the associated software for inferring phylogenetic trees are called the Boolean analysis or BOOL-AN
Better Safe Than Sorry: An Adversarial Approach to Improve Social Bot Detection
The arm race between spambots and spambot-detectors is made of several cycles
(or generations): a new wave of spambots is created (and new spam is spread),
new spambot filters are derived and old spambots mutate (or evolve) to new
species. Recently, with the diffusion of the adversarial learning approach, a
new practice is emerging: to manipulate on purpose target samples in order to
make stronger detection models. Here, we manipulate generations of Twitter
social bots, to obtain - and study - their possible future evolutions, with the
aim of eventually deriving more effective detection techniques. In detail, we
propose and experiment with a novel genetic algorithm for the synthesis of
online accounts. The algorithm allows to create synthetic evolved versions of
current state-of-the-art social bots. Results demonstrate that synthetic bots
really escape current detection techniques. However, they give all the needed
elements to improve such techniques, making possible a proactive approach for
the design of social bot detection systems.Comment: This is the pre-final version of a paper accepted @ 11th ACM
Conference on Web Science, June 30-July 3, 2019, Boston, U
- …