971 research outputs found
Using string-matching to analyze hypertext navigation
A method of using string-matching to analyze hypertext navigation was developed, and evaluated using two weeks of website logfile data. The method is divided into phases that use: (i) exact string-matching to calculate subsequences of links that were repeated in different navigation sessions (common trails through the website), and then (ii) inexact matching to find other similar sessions (a community of users with a similar interest). The evaluation showed how subsequences could be used to understand the information pathways users chose to follow within a website, and that exact and inexact matching provided complementary ways of identifying information that may have been of interest to a whole community of users, but which was only found by a minority. This illustrates how string-matching could be used to improve the structure of hypertext collections
Fixed-Parameter Algorithms for Longest Heapable Subsequence and Maximum Binary Tree
A heapable sequence is a sequence of numbers that can be arranged in a min-heap data structure. Finding a longest heapable subsequence of a given sequence was proposed by Byers, Heeringa, Mitzenmacher, and Zervas (ANALCO 2011) as a generalization of the well-studied longest increasing subsequence problem and its complexity still remains open. An equivalent formulation of the longest heapable subsequence problem is that of finding a maximum-sized binary tree in a given permutation directed acyclic graph (permutation DAG). In this work, we study parameterized algorithms for both longest heapable subsequence and maximum-sized binary tree. We introduce alphabet size as a new parameter in the study of computational problems in permutation DAGs and show that this parameter with respect to a fixed topological ordering admits a complete characterization and a polynomial time algorithm. We believe that this parameter is likely to be useful in the context of optimization problems defined over permutation DAGs
Computing longest common square subsequences
A square is a non-empty string of form YY. The longest common square subsequence (LCSqS) problem is to compute a longest square occurring as a subsequence in two given strings A and B. We show that the problem can easily be solved in O(n^6) time or O(|M|n^4) time with O(n^4) space, where n is the length of the strings and M is the set of matching points between A and B. Then, we show that the problem can also be solved in O(sigma |M|^3 + n) time and O(|M|^2 + n) space, or in O(|M|^3 log^2 n log log n + n) time with O(|M|^3 + n) space, where sigma is the number of distinct characters occurring in A and B. We also study lower bounds for the LCSqS problem for two or more strings
A Proof of Entropy Minimization for Outputs in Deletion Channels via Hidden Word Statistics
From the output produced by a memoryless deletion channel from a uniformly
random input of known length , one obtains a posterior distribution on the
channel input. The difference between the Shannon entropy of this distribution
and that of the uniform prior measures the amount of information about the
channel input which is conveyed by the output of length , and it is natural
to ask for which outputs this is extremized. This question was posed in a
previous work, where it was conjectured on the basis of experimental data that
the entropy of the posterior is minimized and maximized by the constant strings
and and the alternating strings
and respectively. In the present
work we confirm the minimization conjecture in the asymptotic limit using
results from hidden word statistics. We show how the analytic-combinatorial
methods of Flajolet, Szpankowski and Vall\'ee for dealing with the hidden
pattern matching problem can be applied to resolve the case of fixed output
length and , by obtaining estimates for the entropy in
terms of the moments of the posterior distribution and establishing its
minimization via a measure of autocorrelation.Comment: 11 pages, 2 figure
Sketching, Streaming, and Fine-Grained Complexity of (Weighted) LCS
We study sketching and streaming algorithms for the Longest Common Subsequence problem (LCS) on strings of small alphabet size |Sigma|. For the problem of deciding whether the LCS of strings x,y has length at least L, we obtain a sketch size and streaming space usage of O(L^{|Sigma| - 1} log L). We also prove matching unconditional lower bounds.
As an application, we study a variant of LCS where each alphabet symbol is equipped with a weight that is given as input, and the task is to compute a common subsequence of maximum total weight. Using our sketching algorithm, we obtain an O(min{nm, n + m^{|Sigma|}})-time algorithm for this problem, on strings x,y of length n,m, with n >= m. We prove optimality of this running time up to lower order factors, assuming the Strong Exponential Time Hypothesis
- …