971 research outputs found

    Using string-matching to analyze hypertext navigation

    Get PDF
    A method of using string-matching to analyze hypertext navigation was developed, and evaluated using two weeks of website logfile data. The method is divided into phases that use: (i) exact string-matching to calculate subsequences of links that were repeated in different navigation sessions (common trails through the website), and then (ii) inexact matching to find other similar sessions (a community of users with a similar interest). The evaluation showed how subsequences could be used to understand the information pathways users chose to follow within a website, and that exact and inexact matching provided complementary ways of identifying information that may have been of interest to a whole community of users, but which was only found by a minority. This illustrates how string-matching could be used to improve the structure of hypertext collections

    Fixed-Parameter Algorithms for Longest Heapable Subsequence and Maximum Binary Tree

    Get PDF
    A heapable sequence is a sequence of numbers that can be arranged in a min-heap data structure. Finding a longest heapable subsequence of a given sequence was proposed by Byers, Heeringa, Mitzenmacher, and Zervas (ANALCO 2011) as a generalization of the well-studied longest increasing subsequence problem and its complexity still remains open. An equivalent formulation of the longest heapable subsequence problem is that of finding a maximum-sized binary tree in a given permutation directed acyclic graph (permutation DAG). In this work, we study parameterized algorithms for both longest heapable subsequence and maximum-sized binary tree. We introduce alphabet size as a new parameter in the study of computational problems in permutation DAGs and show that this parameter with respect to a fixed topological ordering admits a complete characterization and a polynomial time algorithm. We believe that this parameter is likely to be useful in the context of optimization problems defined over permutation DAGs

    Computing longest common square subsequences

    Get PDF
    A square is a non-empty string of form YY. The longest common square subsequence (LCSqS) problem is to compute a longest square occurring as a subsequence in two given strings A and B. We show that the problem can easily be solved in O(n^6) time or O(|M|n^4) time with O(n^4) space, where n is the length of the strings and M is the set of matching points between A and B. Then, we show that the problem can also be solved in O(sigma |M|^3 + n) time and O(|M|^2 + n) space, or in O(|M|^3 log^2 n log log n + n) time with O(|M|^3 + n) space, where sigma is the number of distinct characters occurring in A and B. We also study lower bounds for the LCSqS problem for two or more strings

    Streaming and Small Space Approximation Algorithms for Edit Distance and Longest Common Subsequence

    Get PDF

    A Proof of Entropy Minimization for Outputs in Deletion Channels via Hidden Word Statistics

    Get PDF
    From the output produced by a memoryless deletion channel from a uniformly random input of known length nn, one obtains a posterior distribution on the channel input. The difference between the Shannon entropy of this distribution and that of the uniform prior measures the amount of information about the channel input which is conveyed by the output of length mm, and it is natural to ask for which outputs this is extremized. This question was posed in a previous work, where it was conjectured on the basis of experimental data that the entropy of the posterior is minimized and maximized by the constant strings 000…\texttt{000}\ldots and 111…\texttt{111}\ldots and the alternating strings 0101…\texttt{0101}\ldots and 1010…\texttt{1010}\ldots respectively. In the present work we confirm the minimization conjecture in the asymptotic limit using results from hidden word statistics. We show how the analytic-combinatorial methods of Flajolet, Szpankowski and Vall\'ee for dealing with the hidden pattern matching problem can be applied to resolve the case of fixed output length and n→∞n\rightarrow\infty, by obtaining estimates for the entropy in terms of the moments of the posterior distribution and establishing its minimization via a measure of autocorrelation.Comment: 11 pages, 2 figure

    Sketching, Streaming, and Fine-Grained Complexity of (Weighted) LCS

    Get PDF
    We study sketching and streaming algorithms for the Longest Common Subsequence problem (LCS) on strings of small alphabet size |Sigma|. For the problem of deciding whether the LCS of strings x,y has length at least L, we obtain a sketch size and streaming space usage of O(L^{|Sigma| - 1} log L). We also prove matching unconditional lower bounds. As an application, we study a variant of LCS where each alphabet symbol is equipped with a weight that is given as input, and the task is to compute a common subsequence of maximum total weight. Using our sketching algorithm, we obtain an O(min{nm, n + m^{|Sigma|}})-time algorithm for this problem, on strings x,y of length n,m, with n >= m. We prove optimality of this running time up to lower order factors, assuming the Strong Exponential Time Hypothesis
    • …
    corecore