Search CORE

1,146 research outputs found

An adaptive hybrid pattern-matching algorithm on indeterminate strings

Author: Smyth Bill
Wang S.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/2009
Field of study

We describe a hybrid pattern-matching algorithm that works on both regular and indeterminate strings. This algorithm is inspired by the recently proposed hybrid algorithm FJS and its indeterminate successor. However, as discussed in this paper, because of the special properties of indeterminate strings, it is not straightforward to directly migrate FJS to an indeterminate version. Our new algorithm combines two fast pattern-matching algorithms, ShiftAnd and BMS (the Sunday variant of the Boyer-Moore algorithm), and is highly adaptive to the nature of the text being processed. It avoids using the border array, therefore avoids some of the cases that are awkward for indeterminate strings. Although not always the fastest in individual test cases, our new algorithm is superior in overall performance to its two component algorithms — perhaps a general advantage of hybrid algorithms

Research Repository

espace@Curtin

An adaptive hybrid pattern-matching algorithm on indeterminate strings

Author: Smyth W.F.
Wang S.
Yu M.
Publication venue
Publication date: 01/01/2008
Field of study

We describe a hybrid pattern-matching algorithm that works on both regular and indeterminate strings. This algorithm is inspired by the recently proposed hybrid algorithm FJS [11] and its indeterminate successor [15]. However, as discussed in this paper, because of the special properties of indeterminate strings, it is not straightforward to directly migrate FJS to an indeterminate version. Our new algorithm combines two fast pattern-matching algorithms, Shift-And and BMS (the Sunday variant of the Boyer-Moore algorithm), and is highly adaptive to the nature of the text being processed. It avoids using the border array, therefore avoids some of the cases that are awkward for indeterminate strings. Although not always the fastest in individual test cases, our new algorithm is superior in overall performance to its two component algorithms — perhaps a general advantage of hybrid algorithms

Research Repository

Linear Algorithm for Conservative Degenerate Pattern Matching

Author: Crochemore Maxime
Iliopoulos Costas S.
Kundu Ritu
Mohamed Manal
Vayani Fatima
Publication venue
Publication date: 15/06/2015
Field of study

A degenerate symbol x* over an alphabet A is a non-empty subset of A, and a sequence of such symbols is a degenerate string. A degenerate string is said to be conservative if its number of non-solid symbols is upper-bounded by a fixed positive constant k. We consider here the matching problem of conservative degenerate strings and present the first linear-time algorithm that can find, for given degenerate strings P* and T* of total length n containing k non-solid symbols in total, the occurrences of P* in T* in O(nk) time

arXiv.org e-Print Archive

King's Research Portal

Computing Covers Using Prefix Tables

Author: Alatabbi Ali
Rahman M. Sohel
Smyth W. F.
Publication venue
Publication date: 01/01/2015
Field of study

An \emph{indeterminate string}

x = x[1..n]

on an alphabet

\Sigma

is a sequence of nonempty subsets of

\Sigma

;

x

is said to be \emph{regular} if every subset is of size one. A proper substring

u

of regular

x

is said to be a \emph{cover} of

x

iff for every

i \in 1..n

, an occurrence of

u

x

includes

x[i]

. The \emph{cover array}

\gamma = \gamma[1..n]

x

is an integer array such that

\gamma[i]

is the longest cover of

x[1..i]

. Fifteen years ago a complex, though nevertheless linear-time, algorithm was proposed to compute the cover array of regular

x

based on prior computation of the border array of

x

. In this paper we first describe a linear-time algorithm to compute the cover array of regular string

x

based on the prefix table of

x

. We then extend this result to indeterminate strings.Comment: 14 pages, 1 figur

arXiv.org e-Print Archive

Research Repository

King's Research Portal

Covering Problems for Partial Words and for Indeterminate Strings

Author: A Apostolico
A Apostolico
A Kalai
CS Iliopoulos
CS Iliopoulos
D Breslauer
D Lokshtanov
D Moore
J Holub
KR Abrahamson
MF Bari
MJ Fischer
P Antoniou
R Impagliazzo
R Impagliazzo
T Kociumaka
WF Smyth
Y Li
Publication venue
Publication date: 01/01/2014
Field of study

We consider the problem of computing a shortest solid cover of an indeterminate string. An indeterminate string may contain non-solid symbols, each of which specifies a subset of the alphabet that could be present at the corresponding position. We also consider covering partial words, which are a special case of indeterminate strings where each non-solid symbol is a don't care symbol. We prove that indeterminate string covering problem and partial word covering problem are NP-complete for binary alphabet and show that both problems are fixed-parameter tractable with respect to

k

, the number of non-solid symbols. For the indeterminate string covering problem we obtain a

2^{O(k \log k)} + n k^{O(1)}

-time algorithm. For the partial word covering problem we obtain a

2^{O(\sqrt{k}\log k)} + nk^{O(1)}

-time algorithm. We prove that, unless the Exponential Time Hypothesis is false, no

2^{o(\sqrt{k})} n^{O(1)}

-time solution exists for either problem, which shows that our algorithm for this case is close to optimal. We also present an algorithm for both problems which is feasible in practice.Comment: full version (simplified and corrected); preliminary version appeared at ISAAC 2014; 14 pages, 4 figure

arXiv.org e-Print Archive

Crossref

King's Research Portal

Efficient pattern matching in degenerate strings with the Burrows–Wheeler transform

Author: Daykin Jacqueline
Groult Richard
Guesnet Yannick
Lecroq Thierry
Lefebvre Arnaud
Léonard Martine
Mouchard Laurent
Prieur-Gaston Élise
Watson Bruce
Publication venue
Publication date: 01/07/2019
Field of study

International audienceA degenerate or indeterminate string on an alphabet Σ is a sequence of non-empty subsets of Σ. Given a degenerate string t of length n, we present a new method based on the Burrows--Wheeler transform for searching for a degenerate pattern of length m in t running in O(mn) time on a constant size alphabet Σ. Furthermore, it is a hybrid pattern-matching technique that works on both regular and degenerate strings. A degenerate string is said to be conservative if its number of non-solid letters is upper-bounded by a fixed positive constant q; in this case we show that the search complexity time is O(qm2). Experimental results show that our method performs well in practice

HAL - Normandie Université

Aberystwyth Research Portal

HAL Descartes

Truly Subquadratic-Time Extension Queries and Periodicity Detection in Strings with Uncertainties

Author: Iliopoulos Costas S.
Radoszewski Jakub
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th Annual Symposium on Combinatorial Pattern Matching (CPM 2016)
Publication date: 01/01/2016
Field of study

Strings with don\u27t care symbols, also called partial words, and more general indeterminate strings are a natural representation of strings containing uncertain symbols. A considerable effort has been made to obtain efficient algorithms for pattern matching and periodicity detection in such strings. Among those, a number of algorithms have been proposed that behave well on random data, but still their worst-case running time is Theta(n^2). We present the first truly subquadratic-time solutions for a number of such problems on partial words that can also be adapted to indeterminate strings over a constant-sized alphabet. We show that

n

longest common compatible prefix queries (which correspond to longest common extension queries in regular strings) can be answered on-line in O(n * sqrt(n * log(n)) time after O(n * sqrt(n * log(n))-time preprocessing. We also present O(n * sqrt(n * log(n))-time algorithms for computing the prefix array and two types of border array of a partial word

Dagstuhl Research Online Publication Server

King's Research Portal

Disclosing false identity through hybrid link analysis

Author: Boongoen Tossapon
Price Christopher John
Shen Qiang
Publication venue
Publication date: 01/03/2010
Field of study

Aberystwyth Research Portal

Indeterminate strings, prefix arrays & undirected graphs

Author: Christodoulakis M.
Ryan P.J.
Smyth W.F.
Wang S.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

An integer array y=y[1..n] is said to be feasible if and only if y[1]=n and, for every i∈2..n, i≤i+y[i]≤n+1. A string is said to be indeterminate if and only if at least one of its elements is a subset of cardinality greater than one of a given alphabet Σ; otherwise it is said to be regular. A feasible array y is said to be regular if and only if it is the prefix array of some regular string. We show using a graph model that every feasible array of integers is a prefix array of some (indeterminate or regular) string, and for regular strings corresponding to y, we use the model to provide a lower bound on the alphabet size. We show further that there is a 1–1 correspondence between labelled simple graphs and indeterminate strings, and we show how to determine the minimum alphabet size σ of an indeterminate string x based on its associated graph Gx. Thus, in this sense, indeterminate strings are a more natural object of combinatorial interest than the strings on elements of Σ that have traditionally been studied

Research Repository

Enhanced covers of regular & indeterminate strings using prefix tables

Author: Alatabbi A.
Rahman M.S.
Simpson J.
Smyth W.F.
Sohidull Islam A.S.
Publication venue: 'Cornell University Library'
Publication date: 01/01/2015
Field of study

A \itbf{cover} of a string x=x[1..n] is a proper substring u of x such that x can be constructed from possibly overlapping instances of u. A recent paper \cite{FIKPPST13} relaxes this definition --- an \itbf{enhanced cover} u of x is a border of x (that is, a proper prefix that is also a suffix) that covers a {\it maximum} number of positions in x (not necessarily all) --- and proposes efficient algorithms for the computation of enhanced covers. These algorithms depend on the prior computation of the \itbf{border array} β[1..n], where β[i] is the length of the longest border of x[1..i], 1≤i≤n. In this paper, we first show how to compute enhanced covers using instead the \itbf{prefix table}: an array π[1..n] such that π[i] is the length of the longest substring of x beginning at position i that matches a prefix of x. Unlike the border array, the prefix table is robust: its properties hold also for \itbf{indeterminate strings} --- that is, strings defined on {\it subsets} of the alphabet Σ rather than individual elements of Σ. Thus, our algorithms, in addition to being faster in practice and more space-efficient than those of \cite{FIKPPST13}, allow us to easily extend the computation of enhanced covers to indeterminate strings. Both for regular and indeterminate strings, our algorithms execute in expected linear time. Along the way we establish an important theoretical result: that the expected maximum length of any border of any prefix of a regular string x is approximately 1.64 for binary alphabets, less for larger one

Research Repository