Search CORE

180 research outputs found

Palindromic Decompositions with Gaps and Errors

Author: A Apostolico
A Frid
D Breslauer
D Gusfield
D Kosolobov
DE Knuth
G Fici
G Manacher
M Crochemore
M Crochemore
M Rubinchik
R Kolpakov
S Gupta
T I
X Droubay
X Droubay
Y Fujishige
Z Galil
Publication venue
Publication date: 27/03/2017
Field of study

Identifying palindromes in sequences has been an interesting line of research in combinatorics on words and also in computational biology, after the discovery of the relation of palindromes in the DNA sequence with the HIV virus. Efficient algorithms for the factorization of sequences into palindromes and maximal palindromes have been devised in recent years. We extend these studies by allowing gaps in decompositions and errors in palindromes, and also imposing a lower bound to the length of acceptable palindromes. We first present an algorithm for obtaining a palindromic decomposition of a string of length n with the minimal total gap length in time O(n log n * g) and space O(n g), where g is the number of allowed gaps in the decomposition. We then consider a decomposition of the string in maximal \delta-palindromes (i.e. palindromes with \delta errors under the edit or Hamming distance) and g allowed gaps. We present an algorithm to obtain such a decomposition with the minimal total gap length in time O(n (g + \delta)) and space O(n g).Comment: accepted to CSR 201

arXiv.org e-Print Archive

Crossref

Prospecting of rhizobium for soy cultivation in soils with deficient natural drainage in the pampa biome.

Author: CROCHEMORE A. G.
GALARZ. L. A.
MATTOS M. L. T.
Publication venue
Publication date: 26/01/2016
Field of study

Repository Open Access to Scientific Information from Embrapa

Detecting One-variable Patterns

Author: A Amir
A Ehrenfeucht
D Angluin
D Kosolobov
D Kosolobov
E Czeizler
F Manea
G Manacher
J Kärkkäinen
JEF Friedl
M Crochemore
M Crochemore
M Lothaire
M Rubinchik
ML Schmid
P Gawrychowski
Z Galil
Z Xu
Publication venue
Publication date: 01/01/2017
Field of study

Given a pattern

p = s_1x_1s_2x_2\cdots s_{r-1}x_{r-1}s_r

such that

x_1,x_2,\ldots,x_{r-1}\in\{x,\overset{{}_{\leftarrow}}{x}\}

, where

x

is a variable and

\overset{{}_{\leftarrow}}{x}

its reversal, and

s_1,s_2,\ldots,s_r

are strings that contain no variables, we describe an algorithm that constructs in

O(rn)

time a compact representation of all

P

instances of

p

in an input string of length

n

over a polynomially bounded integer alphabet, so that one can report those instances in

O(P)

time.Comment: 16 pages (+13 pages of Appendix), 4 figures, accepted to SPIRE 201

arXiv.org e-Print Archive

Crossref

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

Faster subsequence recognition in compressed strings

Author: A Tiskin
A Tiskin
A. Tiskin
BW Watson
CER Alves
G Myers
G Navarro
G Ziv
G Ziv
J Kärkkäinen
JL Bentley
M Crochemore
P Cégielski
TA Welch
W Rytter
WJ Masek
Publication venue
Publication date: 18/01/2008
Field of study

Computation on compressed strings is one of the key approaches to processing massive data sets. We consider local subsequence recognition problems on strings compressed by straight-line programs (SLP), which is closely related to Lempel--Ziv compression. For an SLP-compressed text of length

\bar m

, and an uncompressed pattern of length

n

, C{\'e}gielski et al. gave an algorithm for local subsequence recognition running in time

O(\bar mn^2 \log n)

. We improve the running time to

O(\bar mn^{1.5})

. Our algorithm can also be used to compute the longest common subsequence between a compressed text and an uncompressed pattern in time

O(\bar mn^{1.5})

; the same problem with a compressed pattern is known to be NP-hard

arXiv.org e-Print Archive

Crossref

Warwick Research Archives Portal Repository

Faster algorithms for 1-mappability of a sequence

Author: A Amir
G Manzini
J Fischer
M Crochemore
MA Bender
ML Fredman
ML Metzker
NA Fonseca
SV Thankachan
T Derrien
U Manber
Publication venue
Publication date: 11/05/2017
Field of study

In the k-mappability problem, we are given a string x of length n and integers m and k, and we are asked to count, for each length-m factor y of x, the number of other factors of length m of x that are at Hamming distance at most k from y. We focus here on the version of the problem where k = 1. The fastest known algorithm for k = 1 requires time O(mn log n/ log log n) and space O(n). We present two algorithms that require worst-case time O(mn) and O(n log^2 n), respectively, and space O(n), thus greatly improving the state of the art. Moreover, we present an algorithm that requires average-case time and space O(n) for integer alphabets if m = {\Omega}(log n/ log {\sigma}), where {\sigma} is the alphabet size

arXiv.org e-Print Archive

Crossref

A framework for space-efficient string kernels

Author: A Apostolico
A Apostolico
AJ Smola
AM İleri
B Chor
D Belazzougui
G Reinert
GE Sims
J Herold
J Qi
J Shawe-Taylor
M Crochemore
R Chikhi
S Chairungsee
Publication venue
Publication date: 23/02/2015
Field of study

String kernels are typically used to compare genome-scale sequences whose length makes alignment impractical, yet their computation is based on data structures that are either space-inefficient, or incur large slowdowns. We show that a number of exact string kernels, like the

k

-mer kernel, the substrings kernels, a number of length-weighted kernels, the minimal absent words kernel, and kernels with Markovian corrections, can all be computed in

O(nd)

time and in

o(n)

bits of space in addition to the input, using just a

\mathtt{rangeDistinct}

data structure on the Burrows-Wheeler transform of the input strings, which takes

O(d)

time per element in its output. The same bounds hold for a number of measures of compositional complexity based on multiple value of

k

, like the

k

-mer profile and the

k

-th order empirical entropy, and for calibrating the value of

k

using the data

arXiv.org e-Print Archive

Crossref

Efeito da concentração do inóculo na produção de poli(3-hidroxibutirato) por pseudomonas sp. cmm43.

Author: CROCHEMORE A. G.
MATTOS M. L. T.
MOREIRA A. S.
PERALBA M. C. R.
VENDRUSCOLO C. T.
Publication venue
Publication date: 19/11/2013
Field of study

Repository Open Access to Scientific Information from Embrapa

A Characterization of Bispecial Sturmian Words

Author: A. Carpi
A. Luca de
A. Luca de
E.M. Coven
E.P. Lipatov
F. Mignosi
F. Mignosi
G. Fici
G.H. Hardy
J. Berstel
M. Crochemore
M. Morse
M. Sciortino
M.-P. Béal
S. Dulucq
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

A finite Sturmian word w over the alphabet {a,b} is left special (resp. right special) if aw and bw (resp. wa and wb) are both Sturmian words. A bispecial Sturmian word is a Sturmian word that is both left and right special. We show as a main result that bispecial Sturmian words are exactly the maximal internal factors of Christoffel words, that are words coding the digital approximations of segments in the Euclidean plane. This result is an extension of the known relation between central words and primitive Christoffel words. Our characterization allows us to give an enumerative formula for bispecial Sturmian words. We also investigate the minimal forbidden words for the set of Sturmian words.Comment: Accepted to MFCS 201

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Palermo