Search CORE

971 research outputs found

Using string-matching to analyze hypertext navigation

Author: Ruddle R.A.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2006
Field of study

A method of using string-matching to analyze hypertext navigation was developed, and evaluated using two weeks of website logfile data. The method is divided into phases that use: (i) exact string-matching to calculate subsequences of links that were repeated in different navigation sessions (common trails through the website), and then (ii) inexact matching to find other similar sessions (a community of users with a similar interest). The evaluation showed how subsequences could be used to understand the information pathways users chose to follow within a website, and that exact and inexact matching provided complementary ways of identifying information that may have been of interest to a whole community of users, but which was only found by a minority. This illustrates how string-matching could be used to improve the structure of hypertext collections

White Rose Research Online

Fixed-Parameter Algorithms for Longest Heapable Subsequence and Maximum Binary Tree

Author: Chandrasekaran Karthekeyan
Grigorescu Elena
Istrate Gabriel
Kulkarni Shubhang
Lin Young-San
Zhu Minshen
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 15th International Symposium on Parameterized and Exact Computation (IPEC 2020)
Publication date: 01/01/2020
Field of study

A heapable sequence is a sequence of numbers that can be arranged in a min-heap data structure. Finding a longest heapable subsequence of a given sequence was proposed by Byers, Heeringa, Mitzenmacher, and Zervas (ANALCO 2011) as a generalization of the well-studied longest increasing subsequence problem and its complexity still remains open. An equivalent formulation of the longest heapable subsequence problem is that of finding a maximum-sized binary tree in a given permutation directed acyclic graph (permutation DAG). In this work, we study parameterized algorithms for both longest heapable subsequence and maximum-sized binary tree. We introduce alphabet size as a new parameter in the study of computational problems in permutation DAGs and show that this parameter with respect to a fixed topological ordering admits a complete characterization and a polynomial time algorithm. We believe that this parameter is likely to be useful in the context of optimization problems defined over permutation DAGs

Dagstuhl Research Online Publication Server

Computing longest common square subsequences

Author: Bannai Hideo
Inenaga Shunsuke
Inoue Takafumi
Takeda Masayuki
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Annual Symposium on Combinatorial Pattern Matching (CPM 2018)
Publication date: 01/01/2018
Field of study

A square is a non-empty string of form YY. The longest common square subsequence (LCSqS) problem is to compute a longest square occurring as a subsequence in two given strings A and B. We show that the problem can easily be solved in O(n^6) time or O(|M|n^4) time with O(n^4) space, where n is the length of the strings and M is the set of matching points between A and B. Then, we show that the problem can also be solved in O(sigma |M|^3 + n) time and O(|M|^2 + n) space, or in O(|M|^3 log^2 n log log n + n) time with O(|M|^3 + n) space, where sigma is the number of distinct characters occurring in A and B. We also study lower bounds for the LCSqS problem for two or more strings

Dagstuhl Research Online Publication Server

Trepo - Institutional Repository of Tampere University

Streaming and Small Space Approximation Algorithms for Edit Distance and Longest Common Subsequence

Author: Cheng Kuan
Farhadi Alireza
Hajiaghayi MohammadTaghi
Jin Zhengzhong
Li Xin
Rubinstein Aviad
Seddighin Saeed
Zheng Yu
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)
Publication date: 01/01/2021
Field of study

Dagstuhl Research Online Publication Server

A Proof of Entropy Minimization for Outputs in Deletion Channels via Hidden Word Statistics

Author: Atashpendar Arash
Mestel David
Roscoe A. W.
Ryan Peter Y. A.
Publication venue
Publication date: 30/07/2018
Field of study

From the output produced by a memoryless deletion channel from a uniformly random input of known length

n

, one obtains a posterior distribution on the channel input. The difference between the Shannon entropy of this distribution and that of the uniform prior measures the amount of information about the channel input which is conveyed by the output of length

m

, and it is natural to ask for which outputs this is extremized. This question was posed in a previous work, where it was conjectured on the basis of experimental data that the entropy of the posterior is minimized and maximized by the constant strings

\texttt{000}\ldots

and

\texttt{111}\ldots

and the alternating strings

\texttt{0101}\ldots

and

\texttt{1010}\ldots

respectively. In the present work we confirm the minimization conjecture in the asymptotic limit using results from hidden word statistics. We show how the analytic-combinatorial methods of Flajolet, Szpankowski and Vall\'ee for dealing with the hidden pattern matching problem can be applied to resolve the case of fixed output length and

n\rightarrow\infty

, by obtaining estimates for the entropy in terms of the moments of the posterior distribution and establishing its minimization via a measure of autocorrelation.Comment: 11 pages, 2 figure

arXiv.org e-Print Archive

Open Repository and Bibliography - Luxembourg

Sketching, Streaming, and Fine-Grained Complexity of (Weighted) LCS

Author: Bringmann Karl
Chaudhury Bhaskar Ray
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 38th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2018)
Publication date: 01/01/2018
Field of study

We study sketching and streaming algorithms for the Longest Common Subsequence problem (LCS) on strings of small alphabet size |Sigma|. For the problem of deciding whether the LCS of strings x,y has length at least L, we obtain a sketch size and streaming space usage of O(L^{|Sigma| - 1} log L). We also prove matching unconditional lower bounds. As an application, we study a variant of LCS where each alphabet symbol is equipped with a weight that is given as input, and the task is to compute a common subsequence of maximum total weight. Using our sketching algorithm, we obtain an O(min{nm, n + m^{|Sigma|}})-time algorithm for this problem, on strings x,y of length n,m, with n >= m. We prove optimality of this running time up to lower order factors, assuming the Strong Exponential Time Hypothesis

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

MPG.PuRe