Search CORE

99 research outputs found

Computing the antiperiod(s) of a string

Author: Alamro Hayam
Badkobeh G.
Belazzougui D.
Iliopoulos C.S.
Puglisi S.J.
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 01/01/2019
Field of study

A string S[1, n] is a power (or repetition or tandem repeat) of order k and period n/k, if it can be decomposed into k consecutive identical blocks of length n/k. Powers and periods are fundamental structures in the study of strings and algorithms to compute them efficiently have been widely studied. Recently, Fici et al. (Proc. ICALP 2016) introduced an antipower of order k to be a string composed of k distinct blocks of the same length, n/k, called the antiperiod. An arbitrary string will have antiperiod t if it is prefix of an antipower with antiperiod t. In this paper, we describe efficient algorithm for computing the smallest antiperiod of a string S of length n in O(n) time. We also describe an algorithm to compute all the antiperiods of S that runs in O(n log n) time. © Hayam Alamro, Golnaz Badkobeh, Djamal Belazzougui, Costas S. Iliopoulos, and Simon J. Puglisi.Peer reviewe

Goldsmiths Research Online

Dagstuhl Research Online Publication Server

Helsingin yliopiston digitaalinen arkisto

King's Research Portal

A characterization of the squares in a Fibonacci string

Author: Iliopoulos C.S.
Moore D.
Smyth W.F.
Publication venue: 'Elsevier BV'
Publication date: 01/01/1997
Field of study

A (finite) Fibonacci stringFn is defined as follows: F0 = b, F1 = a; for every integer n ⩾ 2, Fn = Fn − 1Fn − 2. For n ⩾ 1, the length of Fn is denoted by . The infinite Fibonacci stringF is the string which contains every Fn, n ⩾ 1, as a prefix. Apart from their general theoretical importance, Fibonacci strings are often cited as worst-case examples for algorithms which compute all the repetitions or all the “Abelian squares” in a given string. In this paper we provide a characterization of all the squares in F, hence in every prefix Fn; this characterization naturally gives rise to a algorithm which specifies all the squares of Fn in an appropriate encoding. This encoding is made possible by the fact that the squares of Fn occur consecutively, in “runs”, the number of which is . By contrast, the known general algorithms for the computation of the repetitions in an arbitrary string require time (and produce outputs) when applied to a Fibonacci string Fn

Elsevier - Publisher Connector

Research Repository

The covers of a circular Fibonacci string

Author: Iliopoulos C.S.
Moore D.
Smyth W.F.
Publication venue: Charles Babbage Research Centre
Publication date: 01/01/1998
Field of study

Fibonacci strings turn out to constitute worst cases for a number of computer algorithms which find generic patterns in strings. Examples of such patterns are repetitions, Abelian squares, and "covers". In particular, we characterize in this paper the covers of a circular Fibonacci string C(F k ) and show that they are \Theta(jF k j 2 ) in number. We show also that, by making use of an appropriate encoding, these covers can be reported in \Theta(jF k j) time. By contrast, the fastest known algorithm for computing the covers of an arbitrary circular string of length n requires time O(n log n)

Research Repository

A linear algorithm for computing all the squares of a Fibonacci string

Author: Iliopoulos C.S.
Moore D.
Smyth W.F.
Publication venue
Publication date: 01/01/1996
Field of study

A (finite) Fibonacci string

F_n

is defined as follows:

F_0 = b

F_1 = a

; for every integer

n \ge 2

F_n = F_{n-1}F_{n- 2}

. For

n \ge 1

, the length of

F_n

is denoted by

f_n = |F_n|

, while it is convenient to define

f_0 \equiv 0

. The infinite Fibonacci string

F

is the string which contains every

F_n

n \ge 1

, as a prefix. Apart from their general theoretical importance, Fibonacci strings are often cited as worst case examples for algorithms which compute all the repetitions or all the ``Abelian squares'' in a given string. In this paper we provide a characterization of all the squares in

F

, hence in every prefix

F_n

; this characterization naturally gives rise to a

\Theta(f_n)

algorithm which specifies all the squares of

F_n

in an appropriate encoding. This encoding is made possible by the fact that the squares of

F_n

occur consecutively, in ``runs'', the number of which is

\Theta(f_n)

. By contrast, the known general algorithms for the computation of the repetitions in an arbitrary string require

\Theta(f_n\log f_n)

time (and produce

\Theta(f_n\log f_n)

outputs) when applied to a Fibonacci string

F_n

Research Repository

On Quasiperiodic Morphisms

Author: A. Apostolico
A. Glen
C.S. Iliopoulos
F. Levé
F. Levé
G. Richomme
G.S. Brodal
N.J. Fine
R. Groult
R.C. Lyndon
S. Marcus
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Weakly and strongly quasiperiodic morphisms are tools introduced to study quasiperiodic words. Formally they map respectively at least one or any non-quasiperiodic word to a quasiperiodic word. Considering them both on finite and infinite words, we get four families of morphisms between which we study relations. We provide algorithms to decide whether a morphism is strongly quasiperiodic on finite words or on infinite words.Comment: 12 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

HAL Descartes

Efficient Seeds Computation Revisited

Author: A. Apostolico
C.S. Iliopoulos
D. Breslauer
G.S. Brodal
J. Fischer
K. Sadakane
M. Crochemore
M. Crochemore
M. Crochemore
O. Berkman
Y. Li
Publication venue
Publication date: 01/01/2011
Field of study

The notion of the cover is a generalization of a period of a string, and there are linear time algorithms for finding the shortest cover. The seed is a more complicated generalization of periodicity, it is a cover of a superstring of a given string, and the shortest seed problem is of much higher algorithmic difficulty. The problem is not well understood, no linear time algorithm is known. In the paper we give linear time algorithms for some of its versions --- computing shortest left-seed array, longest left-seed array and checking for seeds of a given length. The algorithm for the last problem is used to compute the seed array of a string (i.e., the shortest seeds for all the prefixes of the string) in

O(n^2)

time. We describe also a simpler alternative algorithm computing efficiently the shortest seeds. As a by-product we obtain an

O(n\log{(n/m)})

time algorithm checking if the shortest seed has length at least

m

and finding the corresponding seed. We also correct some important details missing in the previously known shortest-seed algorithm (Iliopoulos et al., 1996).Comment: 14 pages, accepted to CPM 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

King's Research Portal

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Computing the minimum k-Cover of a string

Author: Cole R.
Iliopoulos C.S.
Mohamed M.
Smyth W.F.
Yang L.
Publication venue
Publication date: 01/01/2003
Field of study

We study the minimum k-cover problem. For a given string x of length n and an integer k, the minimum k-cover is the minimum set of k-substrings that covers x. We show that the on-line algorithm that has been proposed by Iliopoulos and Smyth [IS92] is not correct. We prove that the problem is in fact NP-hard. Furthermore, we propose two greedy algorithms that are implemented and tested on different kind of data

Research Repository

IUPACpal: efficient identification of inverted repeats in IUPAC-encoded DNA sequences

Author: Alamro H. (Hayam)
Alzamel M. (Mai)
Iliopoulos C.S. (Costas)
Pissis S. (Solon)
Watts S. (Steven)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/02/2021
Field of study

Background: An inverted repeat is a DNA sequence followed downstream by its reverse complement, potentially with a gap in the centre. Inverted repeats are found in both prokaryotic and eukaryotic genomes and they have been linked with countless possible functions. Many international consortia provide a comprehensive description of common genetic variation making alternative sequence representations, such as IUPAC encoding, necessary for leveraging the full potential of such broad variation datasets. Results: We present IUPACpal, an exact tool for efficient identification of inverted repeats in IUPAC-encoded DNA sequences allowing also for potential mismatches and gaps in the inverted repeats. Conclusion: Within the parameters that were tested, our experimental results show that IUPACpal compares favourably to a similar application packaged with EMBOSS. We show that IUPACpal identifies many previously unidentified inverted repeats when compared with EMBOSS, and that this is also performed with orders of magnitude improved speed.</p

VU Research Portal

CWI's Institutional Repository

INRIA a CCSD electronic archive server

HAL Descartes

King's Research Portal

Hal-Diderot

On the maximal number of cubic subwords in a string

Author: A. Apostolico
A. Thue
A.S. Freankel
C.S. Iliopoulos
D. Damanik
L. Ilie
L. Ilie
M. Crochemore
M. Crochemore
M. Crochemore
M. Crochemore
M. Crochemore
M. Crochemore
M. Giraud
M. Lothaire
M.G. Main
M.G. Main
N.J. Fine
P. Baturo
R.M. Kolpakov
S.J. Puglisi
W. Rytter
W. Rytter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

We investigate the problem of the maximum number of cubic subwords (of the form

www

) in a given word. We also consider square subwords (of the form

ww

). The problem of the maximum number of squares in a word is not well understood. Several new results related to this problem are produced in the paper. We consider two simple problems related to the maximum number of subwords which are squares or which are highly repetitive; then we provide a nontrivial estimation for the number of cubes. We show that the maximum number of squares

xx

such that

x

is not a primitive word (nonprimitive squares) in a word of length

n

is exactly

\lfloor \frac{n}{2}\rfloor - 1

, and the maximum number of subwords of the form

x^k

, for

k\ge 3

, is exactly

n-2

. In particular, the maximum number of cubes in a word is not greater than

n-2

either. Using very technical properties of occurrences of cubes, we improve this bound significantly. We show that the maximum number of cubes in a word of length

n

is between

(1/2)n

and

(4/5)n

. (In particular, we improve the lower bound from the conference version of the paper.)Comment: 14 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

Efficient computation of sequence mappability

Author: Charalampopoulos P. (Panagiotis)
Iliopoulos C.S. (Costas)
Kociumaka T. (Tomasz)
Pissis S. (Solon)
Radoszewski J. (Jakub)
Straszyński J. (Juliusz)
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/02/2022
Field of study

Sequence mappability is an important task in genome resequencing. In the (k, m)-mappability problem, for a given sequence T of length n, the goal is to compute a table whose ith entry is the number of indices j≠ i such that the length-m substrings of T starting at positions i and j have at most k mismatches. Previous works on this problem focused on heuristics computing a rough approximation of the result or on the case of k= 1. We present several efficient algorithms for the general case of the problem. Our main result is an algorithm that, for k= O(1) , works in O(n) space and, with high probability, in O(n· min { mk, log kn}) time. Our algorithm requires a careful adaptation of the k-errata trees of Cole et al. [STOC 2004] to avoid multiple counting of pairs of substrings. Our technique can also be applied to solve the all-pairs Hamming distance problem introduced by Crochemore et al. [WABI 2017]. We further develop O(n2) -time algorithms to compute all (k, m)-mappability tables for a fixed m and all k∈ { 0 , … , m} or a fixed k and all m∈ { k, … , n}. Finally, we show that, for k, m= Θ (log n) , the (k, m)-mappability problem cannot be solved in strongly subquadratic time unless the Strong Exponential Time Hypothesis fails. This is an improved and extended version of a paper presented at SPIRE 2018

CWI's Institutional Repository

INRIA a CCSD electronic archive server