17 research outputs found
Computing Covers Using Prefix Tables
An \emph{indeterminate string} on an alphabet is a
sequence of nonempty subsets of ; is said to be \emph{regular} if
every subset is of size one. A proper substring of regular is said to
be a \emph{cover} of iff for every , an occurrence of in
includes . The \emph{cover array} of is
an integer array such that is the longest cover of .
Fifteen years ago a complex, though nevertheless linear-time, algorithm was
proposed to compute the cover array of regular based on prior computation
of the border array of . In this paper we first describe a linear-time
algorithm to compute the cover array of regular string based on the prefix
table of . We then extend this result to indeterminate strings.Comment: 14 pages, 1 figur
Inferring an Indeterminate String from a Prefix Graph
An \itbf{indeterminate string} (or, more simply, just a \itbf{string}) \s{x}
= \s{x}[1..n] on an alphabet is a sequence of nonempty subsets of
. We say that \s{x}[i_1] and \s{x}[i_2] \itbf{match} (written
\s{x}[i_1] \match \s{x}[i_2]) if and only if \s{x}[i_1] \cap \s{x}[i_2] \ne
\emptyset. A \itbf{feasible array} is an array \s{y} = \s{y}[1..n] of
integers such that \s{y}[1] = n and for every , \s{y}[i] \in
0..n\- i\+ 1. A \itbf{prefix table} of a string \s{x} is an array \s{\pi} =
\s{\pi}[1..n] of integers such that, for every , \s{\pi}[i] = j
if and only if \s{x}[i..i\+ j\- 1] is the longest substring at position
of \s{x} that matches a prefix of \s{x}. It is known from \cite{CRSW13} that
every feasible array is a prefix table of some indetermintate string. A
\itbf{prefix graph} \mathcal{P} = \mathcal{P}_{\s{y}} is a labelled simple
graph whose structure is determined by a feasible array \s{y}. In this paper we
show, given a feasible array \s{y}, how to use \mathcal{P}_{\s{y}} to
construct a lexicographically least indeterminate string on a minimum alphabet
whose prefix table \s{\pi} = \s{y}.Comment: 13 pages, 1 figur
String Comparison in -Order: New Lexicographic Properties & On-line Applications
-order is a global order on strings related to Unique Maximal
Factorization Families (UMFFs), which are themselves generalizations of Lyndon
words. -order has recently been proposed as an alternative to
lexicographical order in the computation of suffix arrays and in the
suffix-sorting induced by the Burrows-Wheeler transform. Efficient -ordering
of strings thus becomes a matter of considerable interest. In this paper we
present new and surprising results on -order in strings, then go on to
explore the algorithmic consequences
Algorithms for Longest Common Abelian Factors
In this paper we consider the problem of computing the longest common abelian
factor (LCAF) between two given strings. We present a simple
time algorithm, where is the length of the strings and is the
alphabet size, and a sub-quadratic running time solution for the binary string
case, both having linear space requirement. Furthermore, we present a modified
algorithm applying some interesting tricks and experimentally show that the
resulting algorithm runs faster.Comment: 13 pages, 4 figure
Absent words and the (dis)similarity analysis of DNA sequences:An experimental study
Additional file 1. All Distance Matrices. In this file (AllMatrices), all the distance matrices are provided
Querying Highly Similar Structured Sequences via Binary Encoding and Word Level Operations
Part 8: First Workshop on Algorithms for Data and Text Mining in Bioinformatics (WADTMB 2012)International audienceIn the post-genomic era there has been an explosion in the amount of genomic data available and the primary research problems have moved from being able to produce interesting biological data to being able to efficiently process and store this information. In this paper we present efficient data structures and algorithms for the High Similarity Sequencing Problem. In the High Similarity Sequencing Problem we are given the sequences S0, S1, …, Sk where Sj = and must perform pattern matching on the set of sequences. In this paper we present time and memory efficient datastructures by exploiting their extensive similarity, our solution leads to a query time of with a memory usage of O(N logN + vk logvk)
Simple linear comparison of strings in V-order
In this paper we focus on a total (but non-lexicographic) ordering of strings called V-order. We devise a new linear-time algorithm for computing the V-comparison of two finite strings. In comparison with the previous algorithm in the literature, our algorithm is both conceptually simpler, based on recording letter positions in increasing order, and more straightforward to implement, requiring only linked lists