366 research outputs found
DNA-inspired online behavioral modeling and its application to spambot detection
We propose a strikingly novel, simple, and effective approach to model online
user behavior: we extract and analyze digital DNA sequences from user online
actions and we use Twitter as a benchmark to test our proposal. We obtain an
incisive and compact DNA-inspired characterization of user actions. Then, we
apply standard DNA analysis techniques to discriminate between genuine and
spambot accounts on Twitter. An experimental campaign supports our proposal,
showing its effectiveness and viability. To the best of our knowledge, we are
the first ones to identify and adapt DNA-inspired techniques to online user
behavioral modeling. While Twitter spambot detection is a specific use case on
a specific social media, our proposed methodology is platform and technology
agnostic, hence paving the way for diverse behavioral characterization tasks
Fast Arc-Annotated Subsequence Matching in Linear Space
An arc-annotated string is a string of characters, called bases, augmented
with a set of pairs, called arcs, each connecting two bases. Given
arc-annotated strings and the arc-preserving subsequence problem is to
determine if can be obtained from by deleting bases from . Whenever
a base is deleted any arc with an endpoint in that base is also deleted.
Arc-annotated strings where the arcs are ``nested'' are a natural model of RNA
molecules that captures both the primary and secondary structure of these. The
arc-preserving subsequence problem for nested arc-annotated strings is basic
primitive for investigating the function of RNA molecules. Gramm et al. [ACM
Trans. Algorithms 2006] gave an algorithm for this problem using time
and space, where and are the lengths of and , respectively. In
this paper we present a new algorithm using time and space,
thereby matching the previous time bound while significantly reducing the space
from a quadratic term to linear. This is essential to process large RNA
molecules where the space is likely to be a bottleneck. To obtain our result we
introduce several novel ideas which may be of independent interest for related
problems on arc-annotated strings.Comment: To appear in Algoritmic
Palindrome Recognition In The Streaming Model
In the Palindrome Problem one tries to find all palindromes (palindromic
substrings) in a given string. A palindrome is defined as a string which reads
forwards the same as backwards, e.g., the string "racecar". A related problem
is the Longest Palindromic Substring Problem in which finding an arbitrary one
of the longest palindromes in the given string suffices. We regard the
streaming version of both problems. In the streaming model the input arrives
over time and at every point in time we are only allowed to use sublinear
space. The main algorithms in this paper are the following: The first one is a
one-pass randomized algorithm that solves the Palindrome Problem. It has an
additive error and uses ) space. The second algorithm is a two-pass
algorithm which determines the exact locations of all longest palindromes. It
uses the first algorithm as the first pass. The third algorithm is again a
one-pass randomized algorithm, which solves the Longest Palindromic Substring
Problem. It has a multiplicative error using only space. We also
give two variants of the first algorithm which solve other related practical
problems
Prospects and limitations of full-text index structures in genome analysis
The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared
When Can You Fold a Map?
We explore the following problem: given a collection of creases on a piece of
paper, each assigned a folding direction of mountain or valley, is there a flat
folding by a sequence of simple folds? There are several models of simple
folds; the simplest one-layer simple fold rotates a portion of paper about a
crease in the paper by +-180 degrees. We first consider the analogous questions
in one dimension lower -- bending a segment into a flat object -- which lead to
interesting problems on strings. We develop efficient algorithms for the
recognition of simply foldable 1D crease patterns, and reconstruction of a
sequence of simple folds. Indeed, we prove that a 1D crease pattern is
flat-foldable by any means precisely if it is by a sequence of one-layer simple
folds.
Next we explore simple foldability in two dimensions, and find a surprising
contrast: ``map'' folding and variants are polynomial, but slight
generalizations are NP-complete. Specifically, we develop a linear-time
algorithm for deciding foldability of an orthogonal crease pattern on a
rectangular piece of paper, and prove that it is (weakly) NP-complete to decide
foldability of (1) an orthogonal crease pattern on a orthogonal piece of paper,
(2) a crease pattern of axis-parallel and diagonal (45-degree) creases on a
square piece of paper, and (3) crease patterns without a mountain/valley
assignment.Comment: 24 pages, 19 figures. Version 3 includes several improvements thanks
to referees, including formal definitions of simple folds, more figures,
table summarizing results, new open problems, and additional reference
Two-dimensional prefix string matching and covering on square matrices
International audienceTwo linear time algorithms are presented. One for determining, for every position in a given square matrix, the longest prefix of a given pattern (also a square matrix) that occurs at that position and one for computing all square covers of a given two-dimensional square matrix
- …