31,178 research outputs found

    Social Fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling

    Full text link
    Spambot detection in online social networks is a long-lasting challenge involving the study and design of detection techniques capable of efficiently identifying ever-evolving spammers. Recently, a new wave of social spambots has emerged, with advanced human-like characteristics that allow them to go undetected even by current state-of-the-art algorithms. In this paper, we show that efficient spambots detection can be achieved via an in-depth analysis of their collective behaviors exploiting the digital DNA technique for modeling the behaviors of social network users. Inspired by its biological counterpart, in the digital DNA representation the behavioral lifetime of a digital account is encoded in a sequence of characters. Then, we define a similarity measure for such digital DNA sequences. We build upon digital DNA and the similarity between groups of users to characterize both genuine accounts and spambots. Leveraging such characterization, we design the Social Fingerprinting technique, which is able to discriminate among spambots and genuine accounts in both a supervised and an unsupervised fashion. We finally evaluate the effectiveness of Social Fingerprinting and we compare it with three state-of-the-art detection algorithms. Among the peculiarities of our approach is the possibility to apply off-the-shelf DNA analysis techniques to study online users behaviors and to efficiently rely on a limited number of lightweight account characteristics

    Inter- and intracontinental migrations and local differentiation have shaped the contemporary epidemiological landscape of canine parvovirus in South America

    Get PDF
    Canine parvovirus (CPV) is a fast-evolving single-stranded DNA virus that causes one of the most significant infectious diseasesof dogs. Although the virus dispersed over long distances in the past, current populations are considered to be spatiallyconfined and with only a few instances of migration between specific localities. It is unclear whether these dynamicsoccur in South America where global studies have not been performed. The aim of this study is to analyze the patterns ofgenetic variability in South American CPV populations and explore their evolutionary relationships with global strains.Genomic sequences of sixty-three strains from South America and Europe were generated and analyzed using a phylodynamicapproach. All the obtained strains belong to the CPV-2a lineage and associate with global strains in four monophyleticgroups or clades. European and South American strains from all the countries here analyzed are representative of awidely distributed clade (Eur-I) that emerged in Southern Europe during 1990?98 to later spread to South America in theearly 2000s. The emergence and spread of the Eur-I clade were correlated with a significant rise in the CPV effective populationsize in Europe and South America. The Asia-I clade includes strains from Asia and Uruguay. This clade originated in Asia during the late 1980s and evolved locally before spreading to South America during 2009?10. The third clade (Eur-II)comprises strains from Italy, Brazil, and Ecuador. This clade appears in South America as a consequence of an early introductionfrom Italy to Ecuador in the middle 1980s and has experienced extensive local genetic differentiation. Some strainsfrom Argentina, Uruguay, and Brazil constitute an exclusive South American clade (SA-I) that emerged in Argentina in the1990s. These results indicate that the current epidemiological scenario is a consequence of inter- and intracontinentalmigrations of strains with different geographic and temporal origins that set the conditions for competition and local differentiationof CPV populations. The coexistence and interaction of highly divergent strains are the main responsible for thedrastic epidemiological changes observed in South America in the last two decades. This highlights the threat of invasionfrom external sources and the importance of whole-genome resolution to robustly infer the origin and spread of new CPVvariants. From a taxonomic standpoint, the findings herein show that the classification system that uses a single aminoacid to identify variants (2a, 2b, and 2c) within the CPV-2a lineage does not reflect phylogenetic relationships and is not suitableto analyze CPV evolution. In this regard, the identification of clades or sublineages within circulating CPV strains is thefirst step towards a genetic and evolutionary classification of the virus.Fil: Grecco, Sofia. Universidad de la República; UruguayFil: Iraola, Gregorio. Universidad de la República; UruguayFil: Decaro, Nicola. Università degli Studi di Bari; ItaliaFil: Alfieri, Alice. Universidade Estadual de Londrina; BrasilFil: Alfieri, Amauri. Universidade Estadual de Londrina; BrasilFil: Gallo Calderon, Marina Beatriz. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Ciencia y Tecnología "Dr. César Milstein". Fundación Pablo Cassará. Instituto de Ciencia y Tecnología "Dr. César Milstein"; ArgentinaFil: da Silva, Ana Paula. Universidade Estadual de Londrina; BrasilFil: Name, Daniela. Universidad de la República. Facultad de Ciencias; UruguayFil: Aldaz, Jaime. Universidad Estatal de Bolivar; EcuadorFil: Calleros, Lucia. Universidad de la República. Facultad de Ciencias; UruguayFil: Marandino, Ana. Universidad de la República. Facultad de Ciencias; UruguayFil: Gonzalo, Tomas. Universidad de la República. Facultad de Ciencias; Urugua

    Kernel methods in genomics and computational biology

    Full text link
    Support vector machines and kernel methods are increasingly popular in genomics and computational biology, due to their good performance in real-world applications and strong modularity that makes them suitable to a wide range of problems, from the classification of tumors to the automatic annotation of proteins. Their ability to work in high dimension, to process non-vectorial data, and the natural framework they provide to integrate heterogeneous data are particularly relevant to various problems arising in computational biology. In this chapter we survey some of the most prominent applications published so far, highlighting the particular developments in kernel methods triggered by problems in biology, and mention a few promising research directions likely to expand in the future

    BOOL-AN: A method for comparative sequence analysis and phylogenetic reconstruction

    Get PDF
    A novel discrete mathematical approach is proposed as an additional tool for molecular systematics which does not require prior statistical assumptions concerning the evolutionary process. The method is based on algorithms generating mathematical representations directly from DNA/RNA or protein sequences, followed by the output of numerical (scalar or vector) and visual characteristics (graphs). The binary encoded sequence information is transformed into a compact analytical form, called the Iterative Canonical Form (or ICF) of Boolean functions, which can then be used as a generalized molecular descriptor. The method provides raw vector data for calculating different distance matrices, which in turn can be analyzed by neighbor-joining or UPGMA to derive a phylogenetic tree, or by principal coordinates analysis to get an ordination scattergram. The new method and the associated software for inferring phylogenetic trees are called the Boolean analysis or BOOL-AN

    Better Safe Than Sorry: An Adversarial Approach to Improve Social Bot Detection

    Full text link
    The arm race between spambots and spambot-detectors is made of several cycles (or generations): a new wave of spambots is created (and new spam is spread), new spambot filters are derived and old spambots mutate (or evolve) to new species. Recently, with the diffusion of the adversarial learning approach, a new practice is emerging: to manipulate on purpose target samples in order to make stronger detection models. Here, we manipulate generations of Twitter social bots, to obtain - and study - their possible future evolutions, with the aim of eventually deriving more effective detection techniques. In detail, we propose and experiment with a novel genetic algorithm for the synthesis of online accounts. The algorithm allows to create synthetic evolved versions of current state-of-the-art social bots. Results demonstrate that synthetic bots really escape current detection techniques. However, they give all the needed elements to improve such techniques, making possible a proactive approach for the design of social bot detection systems.Comment: This is the pre-final version of a paper accepted @ 11th ACM Conference on Web Science, June 30-July 3, 2019, Boston, U
    corecore