Search CORE

17 research outputs found

Computing Covers Using Prefix Tables

Author: Alatabbi Ali
Rahman M. Sohel
Smyth W. F.
Publication venue
Publication date: 01/01/2015
Field of study

An \emph{indeterminate string}

x = x[1..n]

on an alphabet

\Sigma

is a sequence of nonempty subsets of

\Sigma

;

x

is said to be \emph{regular} if every subset is of size one. A proper substring

u

of regular

x

is said to be a \emph{cover} of

x

iff for every

i \in 1..n

, an occurrence of

u

x

includes

x[i]

. The \emph{cover array}

\gamma = \gamma[1..n]

x

is an integer array such that

\gamma[i]

is the longest cover of

x[1..i]

. Fifteen years ago a complex, though nevertheless linear-time, algorithm was proposed to compute the cover array of regular

x

based on prior computation of the border array of

x

. In this paper we first describe a linear-time algorithm to compute the cover array of regular string

x

based on the prefix table of

x

. We then extend this result to indeterminate strings.Comment: 14 pages, 1 figur

arXiv.org e-Print Archive

Crossref

Research Repository

King's Research Portal

Inferring an Indeterminate String from a Prefix Graph

Author: Alatabbi Ali
Rahman M. Sohel
Smyth W. F.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

An \itbf{indeterminate string} (or, more simply, just a \itbf{string}) \s{x} = \s{x}[1..n] on an alphabet

\Sigma

is a sequence of nonempty subsets of

\Sigma

. We say that \s{x}[i_1] and \s{x}[i_2] \itbf{match} (written \s{x}[i_1] \match \s{x}[i_2]) if and only if \s{x}[i_1] \cap \s{x}[i_2] \ne \emptyset. A \itbf{feasible array} is an array \s{y} = \s{y}[1..n] of integers such that \s{y}[1] = n and for every

i \in 2..n

, \s{y}[i] \in 0..n\- i\+ 1. A \itbf{prefix table} of a string \s{x} is an array \s{\pi} = \s{\pi}[1..n] of integers such that, for every

i \in 1..n

, \s{\pi}[i] = j if and only if \s{x}[i..i\+ j\- 1] is the longest substring at position

i

of \s{x} that matches a prefix of \s{x}. It is known from \cite{CRSW13} that every feasible array is a prefix table of some indetermintate string. A \itbf{prefix graph} \mathcal{P} = \mathcal{P}_{\s{y}} is a labelled simple graph whose structure is determined by a feasible array \s{y}. In this paper we show, given a feasible array \s{y}, how to use \mathcal{P}_{\s{y}} to construct a lexicographically least indeterminate string on a minimum alphabet whose prefix table \s{\pi} = \s{y}.Comment: 13 pages, 1 figur

arXiv.org e-Print Archive

Crossref

Research Repository

King's Research Portal

String Comparison in $V$ -Order: New Lexicographic Properties & On-line Applications

Author: Alatabbi Ali
Daykin Jacqueline W.
Rahman M. Sohel
Smyth W. F.
Publication venue
Publication date: 01/01/2015
Field of study

V

-order is a global order on strings related to Unique Maximal Factorization Families (UMFFs), which are themselves generalizations of Lyndon words.

V

-order has recently been proposed as an alternative to lexicographical order in the computation of suffix arrays and in the suffix-sorting induced by the Burrows-Wheeler transform. Efficient

V

-ordering of strings thus becomes a matter of considerable interest. In this paper we present new and surprising results on

V

-order in strings, then go on to explore the algorithmic consequences

arXiv.org e-Print Archive

Research Repository

Algorithms for Longest Common Abelian Factors

Author: Alatabbi Ali
Iliopoulos Costas S.
Langiu Alessio
Rahman M. Sohel
Publication venue
Publication date: 27/02/2015
Field of study

In this paper we consider the problem of computing the longest common abelian factor (LCAF) between two given strings. We present a simple

O(\sigma~ n^2)

time algorithm, where

n

is the length of the strings and

\sigma

is the alphabet size, and a sub-quadratic running time solution for the binary string case, both having linear space requirement. Furthermore, we present a modified algorithm applying some interesting tricks and experimentally show that the resulting algorithm runs faster.Comment: 13 pages, 4 figure

arXiv.org e-Print Archive

Crossref

King's Research Portal

Advances in Stringology and Applications:From Combinatorics via Genomic Analysis to Computational Linguistics

Author: Alatabbi Ali
Publication venue
Publication date: 01/01/2015
Field of study

King's Research Portal

Absent words and the (dis)similarity analysis of DNA sequences:An experimental study

Author: Alatabbi Ali
Athar Tanver
Crochemore Maxime
Rahman M. Sohel
Rahman Mohammad Saifur
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/03/2016
Field of study

Additional file 1. All Distance Matrices. In this file (AllMatrices), all the distance matrices are provided

Crossref

Springer - Publisher Connector

PubMed Central

King's Research Portal

The Francis Crick Institute

Querying Highly Similar Structured Sequences via Binary Encoding and Word Level Operations

Author: Alatabbi Ali
Barton Carl
Iliopoulos Costas,
Mouchard Laurent
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/09/2012
Field of study

Part 8: First Workshop on Algorithms for Data and Text Mining in Bioinformatics (WADTMB 2012)International audienceIn the post-genomic era there has been an explosion in the amount of genomic data available and the primary research problems have moved from being able to produce interesting biological data to being able to efficiently process and store this information. In this paper we present efficient data structures and algorithms for the High Similarity Sequencing Problem. In the High Similarity Sequencing Problem we are given the sequences S0, S1, …, Sk where Sj =

e_{j_1} I_{\sigma_1}e_{j_2} I_{\sigma_2} e_{j_3} I_{\sigma_3}, \dots,e_{j_\ell} I_{\sigma_\ell}

and must perform pattern matching on the set of sequences. In this paper we present time and memory efficient datastructures by exploiting their extensive similarity, our solution leads to a query time of

O(m + vk \log \ell + \frac{m occ_v v}{w} + \frac{PSC(p)m}{w})

with a memory usage of O(N logN + vk logvk)

HAL - Normandie Université

V

Author: Alatabbi
Alatabbi
Ali Alatabbi
Chen
Crochemore
Danh
Daykin
Daykin
Daykin
Daykin
Duval
Jacqueline W. Daykin
Juha Kärkkäinen
Ko
M. Sohel Rahman
Mantaci
W.F. Smyth
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Simple linear comparison of strings in V-order

Author: Alatabbi Ali
Daykin Jackie
Rahman M. Sohel
Smythe William
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

In this paper we focus on a total (but non-lexicographic) ordering of strings called V-order. We devise a new linear-time algorithm for computing the V-comparison of two finite strings. In comparison with the previous algorithm in the literature, our algorithm is both conceptually simpler, based on recording letter positions in increasing order, and more straightforward to implement, requiring only linked lists

Crossref

Research Repository

King's Research Portal