183 research outputs found
Palindrome Recognition In The Streaming Model
In the Palindrome Problem one tries to find all palindromes (palindromic
substrings) in a given string. A palindrome is defined as a string which reads
forwards the same as backwards, e.g., the string "racecar". A related problem
is the Longest Palindromic Substring Problem in which finding an arbitrary one
of the longest palindromes in the given string suffices. We regard the
streaming version of both problems. In the streaming model the input arrives
over time and at every point in time we are only allowed to use sublinear
space. The main algorithms in this paper are the following: The first one is a
one-pass randomized algorithm that solves the Palindrome Problem. It has an
additive error and uses ) space. The second algorithm is a two-pass
algorithm which determines the exact locations of all longest palindromes. It
uses the first algorithm as the first pass. The third algorithm is again a
one-pass randomized algorithm, which solves the Longest Palindromic Substring
Problem. It has a multiplicative error using only space. We also
give two variants of the first algorithm which solve other related practical
problems
Palindrome Recognition In The Streaming Model
A palindrome is defined as a string which reads forwards the same as backwards, like, for example, the string "racecar". In the Palindrome Problem, one tries to find all palindromes in a given string. In contrast, in the case of the Longest Palindromic Substring Problem, the goal is to find an arbitrary one of the longest palindromes in the string.
In this paper we present three algorithms in the streaming model for the the above problems, where at any point in time we are only allowed to use sublinear space. We first present a one-pass randomized algorithm that solves the Palindrome Problem. It has an additive error and uses square root of n space. We also give two variants of the algorithm which solve related and practical problems. The second algorithm determines the exact locations of all longest palindromes using two passes and square root of n space. The third algorithm is a one-pass randomized algorithm, which solves the Longest Palindromic Substring Problem. It has a multiplicative error using only O(log(n)) space
Streaming for Aibohphobes: Longest Palindrome with Mismatches
A palindrome is a string that reads the same as its reverse, such as "aibohphobia" (fear of palindromes).
Given a metric and an integer d>0, a d-near-palindrome} is a string of Hamming distance at most d from its reverse.
We study the natural problem of identifying the longest d-near-palindrome in data streams. The problem is relevant to the analysis of DNA databases, and to the task of repairing recursive structures in documents such as XML and JSON.
We present the first streaming algorithm for the longest d-near-palindrome problem that returns a d-near-palindrome whose length is within a multiplicative (1+eps)-factor of the longest d-near-palindrome.
Our algorithm also returns the set of mismatched indices in the d-near-palindrome, and uses O{frac{dlog^7 n}{epslog(1+eps)}} bits of space, and O{frac{dlog^6 n}{epslog(1+eps)}} update time per arrival symbol.
We show that for d=o(sqrt{n}), any randomized algorithm with multiplicative approximation (1+eps) that succeeds with probability at least 1-1/n requires Omega(dlog n) space.
We further obtain a streaming algorithm that returns a d-near-palindrome whose length is within an additive E-error of the longest d-near-palindrome.
The algorithm uses O{frac{dnlog^6 n}{E}} bits of space and O{frac{dnlog^5 n}{E}} update time. As before, we show that any randomized streaming algorithm that solves the longest d-near-palindrome problem for additive error E with probability at least 1-frac{1}{n}, uses Omegaleft(frac{dn}{E}right) space.
Finally, we give an exact two-pass algorithm that solves the longest d-near-palindrome problem using O{d^2sqrt{n}log^6 n} bits of space
Tight Tradeoffs for Real-Time Approximation of Longest Palindromes in Streams
We consider computing a longest palindrome in the streaming model, where the symbols arrive one-by-one and we do not have random access to the input. While computing the answer exactly using sublinear space is not possible in such a setting, one can still hope for a good approximation guarantee. Our contribution is twofold. First, we provide lower bounds on the space requirements for randomized approximation algorithms processing inputs of length n. We rule out Las Vegas algorithms, as they cannot achieve sublinear space complexity. For Monte Carlo algorithms, we prove a lower bounds of Omega(M log min {|Sigma|, M}) bits of memory; here M=n/E for approximating the answer with additive error E, and M= log n / log (1 + epsilon) for approximating the answer with multiplicative error (1 + epsilon). Second, we design three real-time algorithms for this problem. Our Monte Carlo approximation algorithms for both additive and multiplicative versions of the problem use O(M) words of memory. Thus the obtained lower bounds are asymptotically tight up to a logarithmic factor. The third algorithm is deterministic and finds a longest palindrome exactly if it is short. This algorithm can be run in parallel with a Monte Carlo algorithm to obtain better results in practice. Overall, both the time and space complexity of finding a longest palindrome in a stream are essentially settled
Small-Space Algorithms for the Online Language Distance Problem for Palindromes and Squares
We study the online variant of the language distance problem for two
classical formal languages, the language of palindromes and the language of
squares, and for the two most fundamental distances, the Hamming distance and
the edit (Levenshtein) distance. In this problem, defined for a fixed formal
language , we are given a string of length , and the task is to
compute the minimal distance to from every prefix of . We focus on the
low-distance regime, where one must compute only the distances smaller than a
given threshold . In this work, our contribution is twofold:
- First, we show streaming algorithms, which access the input string only
through a single left-to-right scan. Both for palindromes and squares, our
algorithms use space and time per character in
the Hamming-distance case and space and time
per character in the edit-distance case. These algorithms are randomised by
necessity, and they err with probability inverse-polynomial in .
- Second, we show deterministic read-only online algorithms, which are also
provided with read-only random access to the already processed characters of
. Both for palindromes and squares, our algorithms use space and time per character in the
Hamming-distance case and space and
amortised time per character in the edit-distance case.Comment: Accepted to ISAAC'2
Faster Queries for Longest Substring Palindrome After Block Edit
Palindromes are important objects in strings which have been extensively studied from combinatorial, algorithmic, and bioinformatics points of views. Manacher [J. ACM 1975] proposed a seminal algorithm that computes the longest substring palindromes (LSPals) of a given string in O(n) time, where n is the length of the string. In this paper, we consider the problem of finding the LSPal after the string is edited. We present an algorithm that uses O(n) time and space for preprocessing, and answers the length of the LSPals in O(l + log log n) time, after a substring in T is replaced by a string of arbitrary length l. This outperforms the query algorithm proposed in our previous work [CPM 2018] that uses O(l + log n) time for each query
- …