180 research outputs found
Palindromic Decompositions with Gaps and Errors
Identifying palindromes in sequences has been an interesting line of research
in combinatorics on words and also in computational biology, after the
discovery of the relation of palindromes in the DNA sequence with the HIV
virus. Efficient algorithms for the factorization of sequences into palindromes
and maximal palindromes have been devised in recent years. We extend these
studies by allowing gaps in decompositions and errors in palindromes, and also
imposing a lower bound to the length of acceptable palindromes.
We first present an algorithm for obtaining a palindromic decomposition of a
string of length n with the minimal total gap length in time O(n log n * g) and
space O(n g), where g is the number of allowed gaps in the decomposition. We
then consider a decomposition of the string in maximal \delta-palindromes (i.e.
palindromes with \delta errors under the edit or Hamming distance) and g
allowed gaps. We present an algorithm to obtain such a decomposition with the
minimal total gap length in time O(n (g + \delta)) and space O(n g).Comment: accepted to CSR 201
Prospecting of rhizobium for soy cultivation in soils with deficient natural drainage in the pampa biome.
Detecting One-variable Patterns
Given a pattern such that
, where is a
variable and its reversal, and
are strings that contain no variables, we describe an
algorithm that constructs in time a compact representation of all
instances of in an input string of length over a polynomially bounded
integer alphabet, so that one can report those instances in time.Comment: 16 pages (+13 pages of Appendix), 4 figures, accepted to SPIRE 201
Faster subsequence recognition in compressed strings
Computation on compressed strings is one of the key approaches to processing
massive data sets. We consider local subsequence recognition problems on
strings compressed by straight-line programs (SLP), which is closely related to
Lempel--Ziv compression. For an SLP-compressed text of length , and an
uncompressed pattern of length , C{\'e}gielski et al. gave an algorithm for
local subsequence recognition running in time . We improve
the running time to . Our algorithm can also be used to
compute the longest common subsequence between a compressed text and an
uncompressed pattern in time ; the same problem with a
compressed pattern is known to be NP-hard
Faster algorithms for 1-mappability of a sequence
In the k-mappability problem, we are given a string x of length n and
integers m and k, and we are asked to count, for each length-m factor y of x,
the number of other factors of length m of x that are at Hamming distance at
most k from y. We focus here on the version of the problem where k = 1. The
fastest known algorithm for k = 1 requires time O(mn log n/ log log n) and
space O(n). We present two algorithms that require worst-case time O(mn) and
O(n log^2 n), respectively, and space O(n), thus greatly improving the state of
the art. Moreover, we present an algorithm that requires average-case time and
space O(n) for integer alphabets if m = {\Omega}(log n/ log {\sigma}), where
{\sigma} is the alphabet size
A framework for space-efficient string kernels
String kernels are typically used to compare genome-scale sequences whose
length makes alignment impractical, yet their computation is based on data
structures that are either space-inefficient, or incur large slowdowns. We show
that a number of exact string kernels, like the -mer kernel, the substrings
kernels, a number of length-weighted kernels, the minimal absent words kernel,
and kernels with Markovian corrections, can all be computed in time and
in bits of space in addition to the input, using just a
data structure on the Burrows-Wheeler transform of the
input strings, which takes time per element in its output. The same
bounds hold for a number of measures of compositional complexity based on
multiple value of , like the -mer profile and the -th order empirical
entropy, and for calibrating the value of using the data
Efeito da concentração do inóculo na produção de poli(3-hidroxibutirato) por pseudomonas sp. cmm43.
A Characterization of Bispecial Sturmian Words
A finite Sturmian word w over the alphabet {a,b} is left special (resp. right
special) if aw and bw (resp. wa and wb) are both Sturmian words. A bispecial
Sturmian word is a Sturmian word that is both left and right special. We show
as a main result that bispecial Sturmian words are exactly the maximal internal
factors of Christoffel words, that are words coding the digital approximations
of segments in the Euclidean plane. This result is an extension of the known
relation between central words and primitive Christoffel words. Our
characterization allows us to give an enumerative formula for bispecial
Sturmian words. We also investigate the minimal forbidden words for the set of
Sturmian words.Comment: Accepted to MFCS 201
Coleção de culturas de microrganismos multifuncionais da embrapa clima temperado: implementação de boas práticas.
Identification of pesticide-degrading Pseudomonas strains as poly-B-hydroxybutyrate producers.
201
- …