11 research outputs found
Palindromic Decompositions with Gaps and Errors
Identifying palindromes in sequences has been an interesting line of research
in combinatorics on words and also in computational biology, after the
discovery of the relation of palindromes in the DNA sequence with the HIV
virus. Efficient algorithms for the factorization of sequences into palindromes
and maximal palindromes have been devised in recent years. We extend these
studies by allowing gaps in decompositions and errors in palindromes, and also
imposing a lower bound to the length of acceptable palindromes.
We first present an algorithm for obtaining a palindromic decomposition of a
string of length n with the minimal total gap length in time O(n log n * g) and
space O(n g), where g is the number of allowed gaps in the decomposition. We
then consider a decomposition of the string in maximal \delta-palindromes (i.e.
palindromes with \delta errors under the edit or Hamming distance) and g
allowed gaps. We present an algorithm to obtain such a decomposition with the
minimal total gap length in time O(n (g + \delta)) and space O(n g).Comment: accepted to CSR 201
Finding approximate palindromes in strings
We introduce a novel definition of approximate palindromes in strings, and
provide an algorithm to find all maximal approximate palindromes in a string
with up to errors. Our definition is based on the usual edit operations of
approximate pattern matching, and the algorithm we give, for a string of size
on a fixed alphabet, runs in time. We also discuss two
implementation-related improvements to the algorithm, and demonstrate their
efficacy in practice by means of both experiments and an average-case analysis
Efficient String Matching on Coded Texts
The so called "four Russians technique'' is often used to speed up algorithms by encoding several data items in a single memory cell. Given a sequence of n symbols over a constant size alphabet, one can encode the sequence into O(n / lambda) memory cells in O(log(lambda) ) time using n / log(lambda) processors. This paper presents an efficient CRCW-PRAM string-matching algorithm for coded texts that takes O(log log(m/lambda)) time making only O(n / lambda ) operations, an improvement by a factor of lambda = O(log n) on the number of operations used in previous algorithms. Using this string-matching algorithm one can test if a string is square-free and find all palindromes in a string in O(log log n) time using n / log log n processors
Finding All Periods and Initial Palindromes of a String in Parallel
An optimal O(log log n) time CRCW-PRAM algorithm for computing all periods of a string is presented. Previous parallel algorithms compute the period only if it is shorter than half of the length of the string. This algorithm can be used to find all initial palindromes of a string in the same time and processor bounds. Both algorithms are the fastest possible over a general alphabet. We derive a lower bound for finding palindromes by a modification of a previously known lower bound for finding the period of a string [3]. When p processors are available the bounds become \Theta(d n p e + log log d1+p=ne 2p)
Finding All Periods and Initial Palindromes of a String in Parallel
An optimal O(log log n) time CRCW-PRAM algorithm for computing all period lengths of a string is presented. Previous parallel algorithms compute the period only if it is shorter than half of the length of the string. The algorithm can be used to find all initial palindromes of a string in the same time and processor bounds. Both algorithms are the fastest possible over a general alphabet. We derive a lower bound for finding initial palindromes by modifying a known lower bound for finding the period length of a string [9]. When p processors are available the bounds become \Theta(d n p e+log log d1+p=ne 2p). 1 Introduction A string S[0::n] has a period S[0::p \Gamma 1] of length p if S[i] = S[i + p] for i = 0 \Delta \Delta \Delta n \Gamma p. The period of S[0::n] is defined as its shortest period. Periodicity properties of strings have been studied extensively [18] and are practically used almost in all efficient sequential and parallel string matching algorithms. A palindrome is a ..