Search CORE

366 research outputs found

DNA-inspired online behavioral modeling and its application to spambot detection

Author: Cresci Stefano
Di Pietro Roberto
Petrocchi Marinella
Spognardi Angelo
Tesconi Maurizio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

We propose a strikingly novel, simple, and effective approach to model online user behavior: we extract and analyze digital DNA sequences from user online actions and we use Twitter as a benchmark to test our proposal. We obtain an incisive and compact DNA-inspired characterization of user actions. Then, we apply standard DNA analysis techniques to discriminate between genuine and spambot accounts on Twitter. An experimental campaign supports our proposal, showing its effectiveness and viability. To the best of our knowledge, we are the first ones to identify and adapt DNA-inspired techniques to online user behavioral modeling. While Twitter spambot detection is a specific use case on a specific social media, our proposed methodology is platform and technology agnostic, hence paving the way for diverse behavioral characterization tasks

arXiv.org e-Print Archive

Archivio della ricerca- Università di Roma La Sapienza

PUblication MAnagement

Online Research Database In Technology

Archivio istituzionale della ricerca - Università di Padova

Fast Arc-Annotated Subsequence Matching in Linear Space

Author: D. Harel
G. Blin
G. Lin
I. Munro
J. Alber
P. Bille
P. Damaschke
P. Kilpeläinen
T. Kida
V. Bafna
W. Chen
Publication venue
Publication date: 01/01/2010
Field of study

An arc-annotated string is a string of characters, called bases, augmented with a set of pairs, called arcs, each connecting two bases. Given arc-annotated strings

P

and

Q

the arc-preserving subsequence problem is to determine if

P

can be obtained from

Q

by deleting bases from

Q

. Whenever a base is deleted any arc with an endpoint in that base is also deleted. Arc-annotated strings where the arcs are ``nested'' are a natural model of RNA molecules that captures both the primary and secondary structure of these. The arc-preserving subsequence problem for nested arc-annotated strings is basic primitive for investigating the function of RNA molecules. Gramm et al. [ACM Trans. Algorithms 2006] gave an algorithm for this problem using

O(nm)

time and space, where

m

and

n

are the lengths of

P

and

Q

, respectively. In this paper we present a new algorithm using

O(nm)

time and

O(n + m)

space, thereby matching the previous time bound while significantly reducing the space from a quadratic term to linear. This is essential to process large RNA molecules where the space is likely to be a bottleneck. To obtain our result we introduce several novel ideas which may be of independent interest for related problems on arc-annotated strings.Comment: To appear in Algoritmic

arXiv.org e-Print Archive

CiteSeerX

Crossref

Online Research Database In Technology

Palindrome Recognition In The Streaming Model

Author: Azer Erfan Sadeqi
Berenbrink Petra
Ergün Funda
Mallmann-Trenn Frederik
Publication venue
Publication date: 28/01/2016
Field of study

In the Palindrome Problem one tries to find all palindromes (palindromic substrings) in a given string. A palindrome is defined as a string which reads forwards the same as backwards, e.g., the string "racecar". A related problem is the Longest Palindromic Substring Problem in which finding an arbitrary one of the longest palindromes in the given string suffices. We regard the streaming version of both problems. In the streaming model the input arrives over time and at every point in time we are only allowed to use sublinear space. The main algorithms in this paper are the following: The first one is a one-pass randomized algorithm that solves the Palindrome Problem. It has an additive error and uses

O(\sqrt n

) space. The second algorithm is a two-pass algorithm which determines the exact locations of all longest palindromes. It uses the first algorithm as the first pass. The third algorithm is again a one-pass randomized algorithm, which solves the Longest Palindromic Substring Problem. It has a multiplicative error using only

O(\log(n))

space. We also give two variants of the first algorithm which solve other related practical problems

arXiv.org e-Print Archive

CiteSeerX

Prospects and limitations of full-text index structures in genome analysis

Author: Dawyndt Peter
De Baets Bernard
Fack Veerle
Vyverman Michaël
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2012
Field of study

The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared

Ghent University Academic Bibliography

PubMed Central

When Can You Fold a Map?

Author: Arkin Esther M.
Bender Michael A.
Demaine Erik D.
Demaine Martin L.
Mitchell Joseph S. B.
Sethia Saurabh
Skiena Steven S.
Publication venue
Publication date: 30/08/2003
Field of study

We explore the following problem: given a collection of creases on a piece of paper, each assigned a folding direction of mountain or valley, is there a flat folding by a sequence of simple folds? There are several models of simple folds; the simplest one-layer simple fold rotates a portion of paper about a crease in the paper by +-180 degrees. We first consider the analogous questions in one dimension lower -- bending a segment into a flat object -- which lead to interesting problems on strings. We develop efficient algorithms for the recognition of simply foldable 1D crease patterns, and reconstruction of a sequence of simple folds. Indeed, we prove that a 1D crease pattern is flat-foldable by any means precisely if it is by a sequence of one-layer simple folds. Next we explore simple foldability in two dimensions, and find a surprising contrast: ``map'' folding and variants are polynomial, but slight generalizations are NP-complete. Specifically, we develop a linear-time algorithm for deciding foldability of an orthogonal crease pattern on a rectangular piece of paper, and prove that it is (weakly) NP-complete to decide foldability of (1) an orthogonal crease pattern on a orthogonal piece of paper, (2) a crease pattern of axis-parallel and diagonal (45-degree) creases on a square piece of paper, and (3) crease patterns without a mountain/valley assignment.Comment: 24 pages, 19 figures. Version 3 includes several improvements thanks to referees, including formal definitions of simple folds, more figures, table summarizing results, new open problems, and additional reference

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Two-dimensional prefix string matching and covering on square matrices

Author: Crochemore Maxime
Iliopoulos Costas S.
Korda Maureen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1998
Field of study

International audienceTwo linear time algorithms are presented. One for determining, for every position in a given square matrix, the longest prefix of a given pattern (also a square matrix) that occurs at that position and one for computing all square covers of a given two-dimensional square matrix

CiteSeerX

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM